Get Latest Exam Updates, Free Study materials and Tips

[MCQ’s] Data Mining and Business Intelligence

Introduction to Data Mining, Data Exploration and Preprocessing

1. Data mining refers to ______
a) Special fields for database
b) Knowledge discovery from large database
c) Knowledge base for the database
d) Collections of attributes

Answer: B

2. An attribute is a ____
a) Normalization of Fields
b) Property of the class
c) Characteristics of the object
d) Summarise value

Answer: C

3. Which are not related to Ratio Attributes?
a) Age Group 10-20, 30-50, 35-45 (in Years)
b) Mass 20-30 kg, 10-15 kg
c) Areas 10-50, 50-100 (in Kilometres)
d) Temperature 10°-20°, 30°-50°, 35°-45°

Answer: D

4. The mean is the ________ of a dataset.
a) Average
b) Middle
c) Central
d) Ordered

Answer: A

5. The number that occurs most often within a set of data called as ______
a) Mean
b) Median
c) Mode
d) Range

Answer: C

6. Find the range for given data 40, 30, 43, 48, 26, 50, 55, 40, 34, 42, 47, 50
a) 19
b) 29
c) 35
d) 49

Answer: B

7. Which are not the part of the KDD process from the following
a) Selection
b) Pre-processing
c) Reduction
d) Summation

Answer: D

8. _______ is the output of KDD Process.
a) Query
b) Useful Information
c) Information
d) Data

Answer: B

9. Data mining turns a large collection of data into _____
a) Database
b) Knowledge
c) Queries
d) Transactions

Answer: B

10. In KDD Process, where data relevant to the analysis task are retrieved from the database means _____
a) Data Selection
b) Data Collection
c) Data Warehouse
d) Data Mining

Answer: A

11. In KDD Process, data are transformed and consolidated into appropriate forms for mining by performing summary or aggregation operations is called as _____
a) Data Selection
b) Data Transformation
c) Data Reduction
d) Data Cleaning

Answer: B

12. What kinds of data can be mined?
a) Database data
b) Data Warehouse data
c) Transactional data
d) All of the above

Answer:D

13. Data selection is _____
a) The actual discovery phase of a knowledge discovery process
b) The stage of selecting the right data for a KDD process
c) A subject-oriented integrated time-variant non-volatile collection of data in support of management
d) Record oriented classes finding

Answer: B

14. To remove noise and inconsistent data ____ is needed.
a) Data Cleaning
b) Data Transformation
c) Data Reduction
d) Data Integration

Answer: A

15. Multiple data sources may be combined is called as _____
a) Data Reduction
b) Data Cleaning
c) Data Integration
d) Data Transformation

Answer: C

16. A _____ is a collection of tables, each of which is assigned a unique name which uses the entity-relationship (ER) data model.
a) Relational database
b) Transactional database
c) Data Warehouse
d) Spatial database

Answer: A

17. Relational data can be accessed by _____ written in a relational query language.
a) Select
b) Queries
c) Operations
d) Like

Answer: B

18. _____ studies the collection, analysis, interpretation or explanation, and presentation of data.
a) Statistics
b) Visualization
c) Data Mining
d) Clustering

Answer: A

19. ______ investigates how computers can learn (or improve their performance) based on data.
a) Machine Learning
b) Artificial Intelligence
c) Statistics
d) Visualization

Answer: A

20. _____ is the science of searching for documents or information in documents.
a) Data Mining
b) Information Retrieval
c) Text Mining
d) Web Mining

Answer: B

21. Data often contain _____
a) Target Class
b) Uncertainty
c) Methods
d) Keywords

Answer: B

22. The data mining process should be highly ______
a) On Going
b) Active
c) Interactive
d) Flexible

Answer: C

23. In real world multidimensional view of data mining, The major dimensions are data, knowledge, technologies, and _____
a) Methods
b) Applications
c) Tools
d) Files

Answer: B

24. An _____ is a data field, representing a characteristic or feature of a data object.
a) Method
b) Variable
c) Task
d) Attribute

Answer: D

25. The values of a _____ attribute are symbols or names of things.
a) Ordinal
b) Nominal
c) Ratio
d) Interval

Answer:B

26. “Data about data” is referred to as _____
a) Information
b) Database
c) Metadata
d) File

Answer: C

27. ______ partitions the objects into different groups.
a) Mapping
b) Clustering
c) Classification
d) Prediction

Answer:B

28. In _____, the attribute data are scaled so as to fall within a smaller range, such as -1.0 to 1.0, or 0.0 to 1.0.
a) Aggregation
b) Binning
c) Clustering
d) Normalization

D

29. Normalization by ______ normalizes by moving the decimal point of values of attributes.
a) Z-Score
b) Z-Index
c) Decimal Scaling
d) Min-Max Normalization

Answer: C

30._______ is a top-down splitting technique based on a specified number of bins.
a) Normalization
b) Binning
c) Clustering
d) Classification

Answer: B

Classification

1. How many terms are required for building a bayes model?
a) 1
b) 2
c) 3
d) 4

Answer: c

2. What is needed to make probabilistic systems feasible in the world?
a) Reliability
b) Crucial robustness
c) Feasibility
d) None of the mentioned

Answer: b

3. Where does the bayes rule can be used?
a) Solving queries
b) Increasing complexity
c) Decreasing complexity
d) Answering probabilistic query

Answer: d

4. What does the bayesian network provides?
a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
d) None of the mentioned

Answer: a

5. How the entries in the full joint probability distribution can be calculated?
a) Using variables
b) Using information
c) Both Using variables & information
d) None of the mentioned

Answer: b

6. How the bayesian network can be used to answer any query?
a) Full distribution
b) Joint distribution
c) Partial distribution
d) All of the mentioned

Answer: b

7. How the compactness of the bayesian network can be described?
a) Locally structured
b) Fully structured
c) Partial structure
d) All of the mentioned

Answer: a

8. To which does the local structure is associated?
a) Hybrid
b) Dependant
c) Linear
d) None of the mentioned

Answer: c

9. Which condition is used to influence a variable directly by all the others?
a) Partially connected
b) Fully connected
c) Local connected
d) None of the mentioned

Answer: b

10. What is the consequence between a node and its predecessors while creating bayesian network?
a) Functionally dependent
b) Dependant
c) Conditionally independent
d) Both Conditionally dependant & Dependant

Answer: c

11. A _________ is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks

Answer: a

12. Decision Tree is a display of an algorithm.
a) True
b) False

Answer: a

13. What is Decision Tree?
a) Flow-Chart
b) Structure in which internal node represents test on an attribute, each branch represents outcome of test and each leaf node represents class label
c) Flow-Chart & Structure in which internal node represents test on an attribute, each branch represents outcome of test and each leaf node represents class label
d) None of the mentioned

Answer: c

14. Decision Trees can be used for Classification Tasks.
a) True
b) False

Answer: a

15. Choose from the following that are Decision Tree nodes?
a) Decision Nodes
b) End Nodes
c) Chance Nodes
d) All of the mentioned

Answer: d

16. Decision Nodes are represented by ____________
a) Disks
b) Squares
c) Circles
d) Triangles

Answer: b

17. Chance Nodes are represented by __________
a) Disks
b) Squares
c) Circles
d) Triangles

Answer: c

18. End Nodes are represented by __________
a) Disks
b) Squares
c) Circles
d) Triangles

Answer: d

19. Which of the following are the advantage/s of Decision Trees?
a) Possible Scenarios can be added
b) Use a white box model, If given result is provided by a model
c) Worst, best and expected values can be determined for different scenarios
d) All of the mentioned

Answer: d

20. Which of the following is the valid component of the predictor?
a) data
b) question
c) algorithm
d) all of the mentioned

Answer: d

21. Point out the wrong statement.
a) In Sample Error is also called generalization error
b) Out of Sample Error is the error rate you get on the new dataset
c) In Sample Error is also called resubstitution error
d) All of the mentioned

Answer: a

22. Which of the following is correct order of working?
a) questions->input data ->algorithms
b) questions->evaluation ->algorithms
c) evaluation->input data ->algorithms
d) all of the mentioned

Answer: a

23. Which of the following shows correct relative order of importance?
a) question->features->data->algorithms
b) question->data->features->algorithms
c) algorithms->data->features->question
d) none of the mentioned

Answer: b

24. Point out the correct statement.
a) In Sample Error is the error rate you get on the same dataset used to model a predictor
b) Data have two parts-signal and noise
c) The goal of predictor is to find signal
d) None of the mentioned

Answer: d

25. Which of the following is characteristic of best machine learning method?
a) Fast
b) Accuracy
c) Scalable
d) All of the mentioned

Answer: d

26. True positive means correctly rejected.
a) True
b) False

Answer: b

27. Which of the following trade-off occurs during prediction?
a) Speed vs Accuracy
b) Simplicity vs Accuracy
c) Scalability vs Accuracy
d) None of the mentioned

Answer: d

28. Which of the following expression is true?
a) In sample error < out sample error
b) In sample error > out sample error
c) In sample error = out sample error
d) All of the mentioned

Answer: a

29. Backtesting is a key component of effective trading-system development.
a) True
b) False

Answer: a

30. Which of the following is correct use of cross validation?
a) Selecting variables to include in a model
b) Comparing predictors
c) Selecting parameters in prediction function
d) All of the mentioned

Answer: d

31. Point out the wrong combination.
a) True negative=correctly rejected
b) False negative=correctly rejected
c) False positive=correctly identified
d) All of the mentioned

Answer: c

32. Which of the following is a common error measure?
a) Sensitivity
b) Median absolute deviation
c) Specificity
d) All of the mentioned

Answer: d

33. Which of the following is not a machine learning algorithm?
a) SVG
b) SVM
c) Random forest
d) None of the mentioned

Answer: a

34. Point out the wrong statement.
a) ROC curve stands for receiver operating characteristic
b) Foretime series, data must be in chunks
c) Random sampling must be done with replacement
d) None of the mentioned

Answer: d

35. Which of the following is a categorical outcome?
a) RMSE
b) RSquared
c) Accuracy
d) All of the mentioned

Answer: c

36. For k cross-validation, larger k value implies more bias.
a) True
b) False

Answer: b

37. Which of the following method is used for trainControl resampling?
a) repeatedcv
b) svm
c) bag32
d) none of the mentioned

Answer: a

38. Which of the following can be used to create the most common graph types?
a) qplot
b) quickplot
c) plot
d) all of the mentioned

Answer: a

39. For k cross-validation, smaller k value implies less variance.
a) True
b) False

Answer: a

40. Predicting with trees evaluate _____________ within each group of data.
a) equality
b) homogeneity
c) heterogeneity
d) all of the mentioned

Answer: b

41. Point out the wrong statement.
a) Training and testing data must be processed in different way
b) Test transformation would mostly be imperfect
c) The first goal is statistical and second is data compression in PCA
d) All of the mentioned

Answer: a

42. Which of the following method options is provided by train function for bagging?
a) bagEarth
b) treebag
c) bagFDA
d) all of the mentioned

Answer: d

43. Which of the following is correct with respect to random forest?
a) Random forest are difficult to interpret but often very accurate
b) Random forest are easy to interpret but often very accurate
c) Random forest are difficult to interpret but very less accurate
d) None of the mentioned

Answer: a

44. Point out the correct statement.
a) Prediction with regression is easy to implement
b) Prediction with regression is easy to interpret
c) Prediction with regression performs well when linear model is correct
d) All of the mentioned

Answer: d

45. Which of the following library is used for boosting generalized additive models?
a) gamBoost
b) gbm
c) ada
d) all of the mentioned

Answer: a

46. The principal components are equal to left singular values if you first scale the variables.
a) True
b) False

Answer: b

47. Which of the following is statistical boosting based on additive logistic regression?
a) gamBoost
b) gbm
c) ada
d) mboost

Answer: a

48. Which of the following is one of the largest boost subclass in boosting?
a) variance boosting
b) gradient boosting
c) mean boosting
d) all of the mentioned

Answer: b

49. PCA is most useful for non linear type models.
a) True
b) False

Answer: b

Clustering

1. Which of the following clustering type has characteristic shown in the below figure?
data-science-questions-answers-clustering-q1
a) Partitional
b) Hierarchical
c) Naive bayes
d) None of the mentioned

Answer: b

2. Point out the correct statement
a) The choice of an appropriate metric will influence the shape of the clusters
b) Hierarchical clustering is also called HCA
c) In general, the merges and splits are determined in a greedy manner
d) All of the mentioned

Answer: d

3. Which of the following is finally produced by Hierarchical Clustering?
a) final estimate of cluster centroids
b) tree showing how close things are to each other
c) assignment of each point to clusters
d) all of the mentioned

Answer: b

4. Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned

Answer: d

5. Point out the wrong statement.
a) k-means clustering is a method of vector quantization
b) k-means clustering aims to partition n observations into k clusters
c) k-nearest neighbor is same as k-means
d) none of the mentioned

Answer: c

6. Which of the following combination is incorrect?
a) Continuous – euclidean distance
b) Continuous – correlation similarity
c) Binary – manhattan distance
d) None of the mentioned

Answer: d

7. Hierarchical clustering should be primarily used for exploration.
a) True
b) False

Answer: a

8. Which of the following function is used for k-means clustering?
a) k-means
b) k-mean
c) heatmap
d) none of the mentioned

Answer: a

9. Which of the following clustering requires merging approach?
a) Partitional
b) Hierarchical
c) Naive Bayes
d) None of the mentioned

Answer: b

10. K-means is not deterministic and it also consists of number of iterations.
a) True
b) False

Answer: a

11. Hierarchical clustering should be mainly used for exploration.
a) True
b) False

Answer: a

12. K-means clustering consists of a number of iterations and not deterministic.
a) True
b) False

Answer: a

13. Which is needed by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of these

Answer: d

14. Which function is used for k-means clustering?
a) k-means
b) k-mean
c) heatmap
d) none of the mentioned

Answer: a

15. Which is conclusively produced by Hierarchical Clustering?
a) final estimation of cluster centroids
b) tree showing how nearby things are to each other
c) assignment of each point to clusters
d) all of these

Answer: b

16. Which clustering technique requires a merging approach?
a) Partitional
b) Hierarchical
c) Naive Bayes
d) None of the mentioned

Answer: b

Frequent Patterns

1. A collection of one or more items is called as _____
a) Itemset
b) Support
c) Confidence
d) Support Count

Answer: A

2. Frequency of occurrence of an itemset is called as _____
a) Support
b) Confidence
c) Support Count
d) Rules

Answer: C

3. An itemset whose support is greater than or equal to a minimum support threshold is ______
a) Itemset
b) Frequent Itemset
c) Infrequent items
d) Threshold values

Answer: B

4. What does FP growth algorithm do?
a) It mines all frequent patterns through pruning rules with lesser support
b) It mines all frequent patterns through pruning rules with higher support
c) It mines all frequent patterns by constructing a FP tree
d) It mines all frequent patterns by constructing an itemsets

Answer: C

5. What techniques can be used to improve the efficiency of apriori algorithm?
a) Hash-based techniques
b) Transaction Increases
c) Sampling
d) Cleaning

Answer: A

6. What do you mean by support(A)?
a) Total number of transactions containing A
b) Total Number of transactions not containing A
c) Number of transactions containing A / Total number of transactions
d) Number of transactions not containing A / Total number of transactions

Answer: C

7. How do you calculate Confidence (A -> B)?
a) Support(A #
# B) / Support (A)
b) Support(A #
# B) / Support (B)
c) Support(A #
# B) / Support (A)
d) Support(A #
# B) / Support (B)

Answer: A

8. Which of the following is the direct application of frequent itemset mining?
a) Social Network Analysis
b) Market Basket Analysis
c) Outlier Detection
d) Intrusion Detection

Answer: B

9. What is not true about FP growth algorithms?
a) It mines frequent itemsets without candidate generation
b) There are chances that FP trees may not fit in the memory
c) FP trees are very expensive to build
d) It expands the original database to build FP trees

Answer: D

10. When do you consider an association rule interesting?
a)If it only satisfies min_support
b) If it only satisfies min_confidence
c) If it satisfies both min_support and min_confidence
d) There are other measures to check so

Answer: C

11. What is the relation between a candidate and frequent itemsets?
a) A candidate itemset is always a frequent itemset
b) A frequent itemset must be a candidate itemset
c) No relation between these two
d) Strong relation with transactions

Answer:B

12. Which of the following is not a frequent pattern mining algorithm?
a) Apriori
b) FP growth
c) Decision trees
d) Eclat

Answer: C

13. Which algorithm requires fewer scans of data?
a)Apriori
b)FP Growth
c)Naive Bayes
d)Decision Trees

Answer: B

14. For the question given below consider the data Transactions :

I1, I2, I3, I4, I5, I6
I7, I2, I3, I4, I5, I6
I1, I8, I4, I5
I1, I9, I10, I4, I6
I10, I2, I4, I11, I5
With support as 0.6 find all frequent itemsets?

a) <I1>, <I2>, <I4>, <I5>, <I6>, <I1, I4>, <I2, I4>, <I2, I5>, <I4, I5>, <I4, I6>, <I2, I4, I5>

b) <I2>, <I4>, <I5>, <I2, I4>, <I2, I5>, <I4, I5>, <I2, I4, I5>

c) <I11>, <I4>, <I5>, <I6>, <I1, I4>, <I5, I4>, <I11, I5>, <I4, I6>, <I2, I4, I5>

d) <I1>, <I4>, <I5>, <I6>

Answer: A

15. What will happen if support is reduced?
a) Number of frequent itemsets remains the same
b) Some itemsets will add to the current set of frequent itemsets.
c) Some itemsets will become infrequent while others will become frequent
d) Can not say

Answer: B

16. What is association rule mining?
a) Same as frequent itemset mining
b) Finding of strong association rules using frequent itemsets
c) Using association to analyze correlation rules
d) Finding Itemsets for future trends

Answer: B

17. A definition or a concept is ______ if it classifies any examples as coming within the concept
a) Concurrent
b) Consistent
c) Constant
d) Compete

Answer: B

Business Intelligence

1. Business intelligence (BI) is a broad category of application programs which includes _____________
a) Decision support
b) Data mining
c) OLAP
d) All of the mentioned

Answer: d

2. Point out the correct statement.
a) OLAP is an umbrella term that refers to an assortment of software applications for analyzing an organization’s raw data for intelligent decision making
b) Business intelligence equips enterprises to gain business advantage from data
c) BI makes an organization agile thereby giving it a lower edge in today’s evolving market condition
d) None of the mentioned

Answer: b

3. BI can catalyze a business’s success in terms of _____________
a) Distinguish the products and services that drive revenues
b) Rank customers and locations based on profitability
c) Ranks customers and locations based on probability
d) All of the mentioned

Answer: d

4. Which of the following areas are affected by BI?
a) Revenue
b) CRM
c) Sales
d) All of the mentioned

Answer: b

5. Point out the wrong statement.
a) Data is factual information for analysis
b) BI is a category of database software that provides an interface to help users quickly and interactively scrutinize the results in a variety of dimensions of the data
c) Customer relationship management (CRM) entails all aspects of interaction that a company has with its customer
d) None of the mentioned

Answer: b

6. ________ is a performance management tool that recapitulates an organization’s performance from several standpoints on a single page.
a) Balanced Scorecard
b) Data Cube
c) Dashboard
d) All of the mentioned

Answer: a

7. __________ is a system where operations like data extraction, transformation and loading operations are executed.
a) Data staging
b) Data integration
c) ETL
d) None of the mentioned

Answer: a

8. _________ is a category of applications and technologies for presenting and analyzing corporate and external data.
a) Data warehouse
b) MIS
c) EIS
d) All of the mentioned

Answer: c

9. Which of the following is the process of basing an organization’s actions and decisions on actual measured results of performance?
a) Institutional performance management
b) Gap analysis
c) Slice and Dice
d) None of the mentioned

Answer: a

10. Which of the following does not form part of BI Stack in SQL Server?
a) SSRS
b) SSIS
c) SSAS
d) OBIEE

Answer: d

11. How many types of BI users are there?
a) 2
b) 3
c) 4
d) 5

Answer: C

12. Which of the following statement is true about Business Intelligence?
a) BI convert raw data into meaningful information
b) BI has a direct impact on organization’s strategic, tactical and operational business decisions.
c) BI tools perform data analysis and create reports, summaries, dashboards, maps, graphs, and charts
d) All of the above

Answer: D

13. KPI stands for?
a) Key Performance Indicators
b) Key Performance Identifer
c) Key Processes Identifer
d) Key Processes Indicators

Answer: A

14. Which of the following does not form part of BI Stack in SQL Server?
a) SSRS
b) SSIS
c) SSAS
d) OBIEE

Answer: D

15. _________ is a category of applications and technologies for presenting and analyzing corporate and external data.
a) MIS
b) DIS
c) EIS
d) CIS

Answer: C

16. Which of the following areas are affected by BI?
a) Revenue
b) CRM
c) Sales
d) CPM

Answer: B

17. Business intelligence (BI) is a broad category of application programs which includes _____________
a) Decision support
b) Data Mining
c) OLAP
d) All of the above

Answer: D

18. __________ is a system where operations like data extraction, transformation and loading operations are executed.
a) Data staging
b) Data integration
c) ETL
d) None of the above

Answer: A

19. Business intelligence equips enterprises to gain business advantage from data
a) TRUE
b) FALSE
c) Can be true or false
d) Can not say

Answer: A

20. BI is a category of database software that provides an interface to help users quickly and interactively scrutinize the results in a variety of dimensions of the data
a) TRUE
b) FALSE
c) Can be true or false
d) Can not say

Answer: B

21. one of the following is the Business intelligence broad category of application programs?
a) OLAP
b) Data mining
c) Decision support
d) Both A and B
e) All of these

Answer: C

22. which of the following is a central point from which all customer contacts are managed?
a) call center
b) help system
c) multichannel marketing
d) contact center
e) None of these

Answer: D

23. one of the following is not part of Lewin’s three-step approach to change?

a) Freezing
b) Unfreezing
c) Changing behavior
d) Initiating change
e) All of these

Answer: D

24. Buisness intelligence affected by______areas.

a) Sales
b) CRM
c) Revenue
d) Both A and B
e) None of these

Answer: B

25. _____technique used to predict future behavior and anticipate the consequences of change.

a) predictive modeling
b) disaster recovery
c) predictive technology
d) Digital Silhouettes
e) Both A and B

Answer: A

26. One of the following term for a radical rethinking of the nature of the business?

a) Paradigm shift
b) Revolutionary change
c) Both A and B
d) Transformational change
e) All of these

Answer: A

27. _____is not a part of BI Stack in SQL Server.

a) SSRS
b) OBIEE
c) SSAS
d) SSIS
e) None of these

Answer: B

28. The first step in a Stage-gate process is____.

a) Generate ideas and concepts
b) Demonstrate a plan
c) Initiate learning
d) Develop a product
e) All of these

Answer: A

29. IS stands for_____.

a) Internal services
b) Information systems
c) International sales
d) Intelligent strategy
e) None of these

Answer: B

30. ______is not a aggregate function.

a) With
b) Sum
c) Avg
d) Min
e) None of these

Answer: A

31. one of the following is not an implementation activity for an information system?

a) User training and development
b) System documentation
c) Software development
d) a marketing plan
e) All of these

Answer: D

32. ______ database provides creation logins on the destination server

a) Detach
b) Move
c) Copy
d) Attach
e) None of these

Answer: C

Prepare For Your Placements: https://lastmomenttuitions.com/courses/placement-preparation/

/ Youtube Channel: https://www.youtube.com/channel/UCGFNZxMqKLsqWERX_N2f08Q

Follow For Latest Updates, Study Tips & More Content!

/lastmomenttuition

/ Last Moment Tuitions

/ lastmomentdost