Get Latest Exam Updates, Free Study materials and Tips
1. Data scrubbing is which of the following?
A. A process to reject data from the data warehouse and to create the necessary indexes
B. A process to load the data in the data warehouse and to create the necessary indexes
C. A process to upgrade the quality of data after it is moved into a data warehouse
D. A process to upgrade the quality of data before it is moved into a data warehouse
Answer: Option D
2. The @active data warehouse architecture includes which of the following?
A. At least one data mart
B. Data that can be extracted from numerous internal and external sources
C. Near real-time updates
D. All of the above
Answer: Option D
3. A goal of data mining includes which of the following?
A.To explain some observed event or condition
B.To confirm that data exists
C.To analyze data for expected relationships
D.To create a new data warehouse
Answer: Option A
4. An operational system is which of the following?
A.A system that is used to run the business in real-time and is based on historical data.
B.A system that is used to run the business in real-time and is based on current data.
C.A system that is used to support decision-making and is based on current data.
D.A system that is used to support decision-making and is based on historical data.
Answer: Option B
5. A data warehouse is which of the following?
A.Can be updated by end-users.
B.Contains numerous naming conventions and formats.
C.Organized around important subject areas.
D.Contains only current data.
Answer: Option C
6. A snowflake schema is which of the following types of tables?
A.Fact
B.Dimension
C.Helper
D.All of the above
Answer: Option D
7. The generic two-level data warehouse architecture includes which of the following?
A.At least one data mart
B.Data that can be extracted from numerous internal and external sources
C.Near real-time updates
D.All of the above
Answer: Option B
8. Fact tables are which of the following?
A.Completely denoralized
B.Partially denoralized
C.Completely normalized
D.Partially normalized
Answer: Option C
9. Data transformation includes which of the following?
A.A process to change data from a detailed level to a summary level
B.A process to change data from a summary level to a detailed level
C.Joining data from one source into various sources of data
D.Separating data from one source into various sources of data
Answer: Option A
10. Reconciled data is which of the following?
A.Data stored in the various operational systems throughout the organization.
B.Current data intended to be the single source for all decision support systems.
C.Data stored in one operational system in the organization.
D.Data that has been selected and formatted for end-user support applications.
Answer: Option B
11. The load and index is which of the following?
A.A process to reject data from the data warehouse and to create the necessary indexes
B.A process to load the data in the data warehouse and to create the necessary indexes
C.A process to upgrade the quality of data after it is moved into a data warehouse
D.A process to upgrade the quality of data before it is moved into a data warehouse
Answer: Option B
12. The extract process is which of the following?
A.Capturing all of the data contained in various operational systems
B.Capturing a subset of the data contained in various operational systems
C.Capturing all of the data contained in various decision support systems
D.Capturing a subset of the data contained in various decision support systems
Answer: Option B
13. A star schema has what type of relationship between a dimension and fact table?
A.Many-to-many
B.One-to-one
C.One-to-many
D.All of the above
Answer: Option C
14. Transient data is which of the following?
A.Data in which changes to existing records cause the previous version of the records to be eliminated
B.Data in which changes to existing records do not cause the previous version of the records to be eliminated
C.Data that are never altered or deleted once they have been added
D.Data that are never deleted once they have been added
Answer: Option A
15. A multifield transformation does which of the following?
A.Converts data from one field into multiple fields
B.Converts data from multiple fields into one field
C.Converts data from multiple fields into multiple fields
D.All of the above
Answer: Option D
16. A data mart is designed to optimize the performance for well-defined and predicable uses.
A. True
B. False
Answer: Option A
17. Successful data warehousing requires that a formal program in total quality management (TQM) be implemented.
A. True
B. False
Answer: Option A
18. Data in operational systems are typically fragmented and inconsistent.
A. True
B. False
Answer: Option A
19. Most operational systems are based on the use of transient data.
A. True
B. False
Answer: Option A
20. Independent data marts are often created because an organization focuses on a series of short-term business objectives.
A. True
B. False
Answer: Option A
21. Joining is the process of partitioning data according to predefined criteria.
A. True
B. False
Answer: Option B
22. The role of the ETL process is to identify erroneous data and to fix them.
A. True
B. False
Answer: Option B
23. Data in the data warehouse are loaded and refreshed from operational systems.
A. True
B. False
Answer: Option A
24. Star schema is suited to online transaction processing and therefore is generally used in operational systems, operational data stores, or an EDW.
A. True
B. False
Answer: Option B
25. Periodic data are data that are physically altered once added to the store.
A. True
B. False
Answer: Option B
26. Both status data and event data can be stored in a database.
A. True
B. False
Answer: Option A
27. Static extract is used for ongoing warehouse maintenance.
A. True
B. False
Answer: Option b
28. Data scrubbing can help upgrade data quality;it is not a long-term solution to the data quality problem.
A. True
B. False
Answer: Option A
29. Every key used to join the fact table with a dimensional table should be a surrogate key.
A. True
B. False
Answer: Option A
30. Derived data are detailed, current data intended to be single, authoritative source for all decision suport applications.
A. True
B. False
Answer: Option B
1. All data in flat file is in this format.
A. Sort
B. ETL
C. Format
D. String
Ans: D
2. It is used to push data into a relation database table. This control will be the destination for most fact table data flows.
A. Web Scraping
B. Data inspection
C. OLE DB Source
D. OLE DB Destination
Ans: D
3. Logical Data Maps
A. These are used to identify which fields from which sources are going to with destinations. It allows the ETL developer to identify if there is a need to do a data type change or aggregation prior to beginning coding of an ETL process.
B. These can be used to flag an entire file-set that is ready for processing by the ETL process. It contains no meaningful data bu the fact it exists is the key to the process.
C. Data is pulled from multiple sources to be merged into one or more destinations.
D. It is used to massage data in transit between the source and destination.
Ans: A
4. Data access methods.
A. Pull Method
B. Push and Pull
C. Load in Parallel
D. Union all
Ans: B
5. OLTP
A. Process to move data from a source to destination.
B. Transactional database that is typically attached to an application. This source provides the benefit of known data types and standardized access methods. This system enforces data integrity.
C. All data in flat file is in this format.
D. This control can be used to add columns to the stream or make modifications to data within the stream. Should be used for simple modifications.
Ans: B
6. COBOL
A. Process to move data from a source to destination.
B. The easiest to consume from the ETL standpoint.
C. Two methods to ensure data integrity.
D. Many routines of the Mainframe system are written in this.
Ans: D
7. What ETL Stands for
A. Data inspection
B. Transformation
C. Extract, Transform, Load
D. Data Flow
Ans: C
8. The source system initiates the data transfer for the ETL process. This method is uncommon in practice, as each system would have to move the data to the ETL process individually.
A. Custom
B. Automation
C. Pull Method
D. Push Method
Ans: D
9. Sentinel Files
A. These are used to identify which fields from which sources are going to with destinations. It allows the ETL developer to identify if there is a need to do a data type change or aggregation prior to beginning coding of an ETL process.
B. These can be used to flag an entire file-set that is ready for processing by the ETL process. It contains no meaningful data bu the fact it exists is the key to the process.
C. ETL can be used to automate the movement of data between two locations. This standardizes the process so that the load is done the same way every run.
D. This is used to create multiple streams within a data flow from a single stream. All records in the stream are sent down all paths. Typically uses a merge-join to recombine the streams later in the data flow.
Ans: B
10. Checkpoints
A. Similar to “break up processes”, checkpoints provide markers for what data has been processed in case an error occurs during the ETL process.
B. Similar to XML’s structured text file.
C. Many routines of the Mainframe system are written in this.
D. It is used to import text files for ETL processing.
Ans: A
11. Mainframe systems use this. This requires a conversion to the more common ASCII format.
A. ETL
B. XML
C. Sort
D. EBCDIC
Ans: D
12. Ultimate flexibility, unit testing is available, usually poor documentation.
A. ETL
B. Custom
C. OLTP
D. Sort
Ans: B
13. Conditional Split
A. Many routines of the Mainframe system are written in this.
B. Data is pulled from multiple sources to be merged into one or more destinations.
C. It allows multiple streams to be created from a single stream. Only rows that match the criteria for a given path are sent down that path.
D. This is used to create multiple streams within a data flow from a single stream. All records in the stream are sent down all paths. Typically uses a merge-join to recombine the streams later in the data flow.
Ans: C
14. Flat files
A. The easiest to consume from the ETL standpoint.
B. Three components of data flow.
C. Three common usages of ETL.
D. Two methods to ensure data integrity.
Ans: A
15. This is used to create multiple streams within a data flow from a single stream. All records in the stream are sent down all paths. Typically uses a merge-join to recombine the streams later in the data flow.
A. OLTP
B. Mainframe
C. EBCDIC
D. Multicast
Ans: D
16. There are little to no benefits to the ETL developer when accessing these types of systems and many detriments. The ability to access these systems is very limited and typically FTP of text files is used to facilitate access.
A. Mainframe
B. Union all
C. File Name
D. Multicast
Ans: A
17. Shows the path to the file to be imported.
A. File Name
B. Mainframe
C. Format
D. Union all
Ans: A
18. Wheel is already invented, documented, good support.
A. Format
B. COBOL
C. Tool Suite
D. Flat files
Ans: C
19. Similar to XML’s structured text file.
A. Data Scrubbing
B. EBCDIC
C. String
D. Web Scraping
Ans: D
20. Flat file control
A. Three components of data flow.
B. It is used to import text files for ETL processing.
C. The easiest to consume from the ETL standpoint.
D. Shows the path to the file to be imported.
Ans: B
21. Two methods to ensure data integrity.
A. Sources, Transformation, Destination
B. Data inspection
C. Row Count Inspection, Data Inspection
D. Row Count Inspection
Ans: C
22. Transformation
A. Data is pulled from multiple sources to be merged into one or more destinations.
B. It is used to import text files for ETL processing.
C. Process to move data from a source to destination.
D. It is used to massage data in transit between the source and destination.
Ans: D
23. Three common usages of ETL.
A. Data Scrubbing
B. Sources, Transformation, Destination
C. Merging Data
D. Merging Data, Data Scrubbing, Automation
Ans: D
24. Load in Parallel
A. A value of delimited shou;d be selected for delimited files.
B. Data is pulled from multiple sources to be merged into one or more destinations.
C. This will reduce the run time of ETL process and reduce the window for hardware failure to affect the process.
D. this should be check if column name have been included in the first row of the file.
Ans: C
25. This can be computationally expensive excluding SSD.
A. Hard Drive I/O
B. Mainframe
C. Tool Suite
D. Data Scrubbing
Ans: A
26. A value of delimited shou;d be selected for delimited files.
A. Sort
B. Format
C. String
D. OLTP
Ans: B
27. this should be check if column name have been included in the first row of the file.
A. Row Count Inspection, Data Inspection
B. Format of the Date
C. Column names in the first data row checkbox
D. Do most work in transformation phase
Ans: C
28. OLAP stands for
a) Online analytical processing
b) Online analysis processing
c) Online transaction processing
d) Online aggregate processing
Answer: a
29. Data that can be modeled as dimension attributes and measure attributes are called _______ data.
a) Multidimensional
b) Single Dimensional
c) Measured
d) Dimensional
Answer: a
30. The generalization of cross-tab which is represented visually is ____________ which is also called as data cube.
a) Two dimensional cube
b) Multidimensional cube
c) N-dimensional cube
d) Cuboid
Answer: a
31. The process of viewing the cross-tab (Single dimensional) with a fixed value of one attribute is
a) Slicing
b) Dicing
c) Pivoting
d) Both Slicing and Dicing
Answer: a
32. The operation of moving from finer-granularity data to a coarser granularity (by means of aggregation) is called a ________
a) Rollup
b) Drill down
c) Dicing
d) Pivoting
Answer: a
33. In SQL the cross-tabs are created using
a) Slice
b) Dice
c) Pivot
d) All of the mentioned
Answer: a
34.{ (item name, color, clothes size), (item name, color), (item name, clothes size), (color, clothes size), (item name), (color), (clothes size), () }
This can be achieved by using which of the following ?
a) group by rollup
b) group by cubic
c) group by
d) none of the mentioned
Answer: d
35. What do data warehouses support?
a) OLAP
b) OLTP
c) OLAP and OLTP
d) Operational databases
Answer: a
36.SELECT item name, color, clothes SIZE, SUM(quantity)
FROM sales
GROUP BY rollup(item name, color, clothes SIZE);
How many grouping is possible in this rollup?
a) 8
b) 4
c) 2
d) 1
Answer: b
37. Which one of the following is the right syntax for DECODE?
a) DECODE (search, expression, result [, search, result]… [, default])
b) DECODE (expression, result [, search, result]… [, default], search)
c) DECODE (search, result [, search, result]… [, default], expression)
d) DECODE (expression, search, result [, search, result]… [, default])
Answer: d
1. Data mining refers to ______
a) Special fields for database
b) Knowledge discovery from large database
c) Knowledge base for the database
d) Collections of attributes
Answer: B
2. An attribute is a ____
a) Normalization of Fields
b) Property of the class
c) Characteristics of the object
d) Summarise value
Answer: C
3. Which are not related to Ratio Attributes?
a) Age Group 10-20, 30-50, 35-45 (in Years)
b) Mass 20-30 kg, 10-15 kg
c) Areas 10-50, 50-100 (in Kilometres)
d) Temperature 10°-20°, 30°-50°, 35°-45°
Answer: D
4. The mean is the ________ of a dataset.
a) Average
b) Middle
c) Central
d) Ordered
Answer: A
5. The number that occurs most often within a set of data called as ______
a) Mean
b) Median
c) Mode
d) Range
Answer: C
6. Find the range for given data 40, 30, 43, 48, 26, 50, 55, 40, 34, 42, 47, 50
a) 19
b) 29
c) 35
d) 49
Answer: B
7. Which are not the part of the KDD process from the following
a) Selection
b) Pre-processing
c) Reduction
d) Summation
Answer: D
8. _______ is the output of KDD Process.
a) Query
b) Useful Information
c) Information
d) Data
Answer: B
9. Data mining turns a large collection of data into _____
a) Database
b) Knowledge
c) Queries
d) Transactions
Answer: B
10. In KDD Process, where data relevant to the analysis task are retrieved from the database means _____
a) Data Selection
b) Data Collection
c) Data Warehouse
d) Data Mining
Answer: A
11. In KDD Process, data are transformed and consolidated into appropriate forms for mining by performing summary or aggregation operations is called as _____
a) Data Selection
b) Data Transformation
c) Data Reduction
d) Data Cleaning
Answer: B
12. What kinds of data can be mined?
a) Database data
b) Data Warehouse data
c) Transactional data
d) All of the above
Answer:D
13. Data selection is _____
a) The actual discovery phase of a knowledge discovery process
b) The stage of selecting the right data for a KDD process
c) A subject-oriented integrated time-variant non-volatile collection of data in support of management
d) Record oriented classes finding
Answer: B
14. To remove noise and inconsistent data ____ is needed.
a) Data Cleaning
b) Data Transformation
c) Data Reduction
d) Data Integration
Answer: A
15. Multiple data sources may be combined is called as _____
a) Data Reduction
b) Data Cleaning
c) Data Integration
d) Data Transformation
Answer: C
16. A _____ is a collection of tables, each of which is assigned a unique name which uses the entity-relationship (ER) data model.
a) Relational database
b) Transactional database
c) Data Warehouse
d) Spatial database
Answer: A
17. Relational data can be accessed by _____ written in a relational query language.
a) Select
b) Queries
c) Operations
d) Like
Answer: B
18. _____ studies the collection, analysis, interpretation or explanation, and presentation of data.
a) Statistics
b) Visualization
c) Data Mining
d) Clustering
Answer: A
19. ______ investigates how computers can learn (or improve their performance) based on data.
a) Machine Learning
b) Artificial Intelligence
c) Statistics
d) Visualization
Answer: A
20. _____ is the science of searching for documents or information in documents.
a) Data Mining
b) Information Retrieval
c) Text Mining
d) Web Mining
Answer: B
21. Data often contain _____
a) Target Class
b) Uncertainty
c) Methods
d) Keywords
Answer: B
22. The data mining process should be highly ______
a) On Going
b) Active
c) Interactive
d) Flexible
Answer: C
23. In real world multidimensional view of data mining, The major dimensions are data, knowledge, technologies, and _____
a) Methods
b) Applications
c) Tools
d) Files
Answer: B
24. An _____ is a data field, representing a characteristic or feature of a data object.
a) Method
b) Variable
c) Task
d) Attribute
Answer: D
25. The values of a _____ attribute are symbols or names of things.
a) Ordinal
b) Nominal
c) Ratio
d) Interval
Answer:B
26. “Data about data” is referred to as _____
a) Information
b) Database
c) Metadata
d) File
Answer: C
27. ______ partitions the objects into different groups.
a) Mapping
b) Clustering
c) Classification
d) Prediction
Answer:B
28. In _____, the attribute data are scaled so as to fall within a smaller range, such as -1.0 to 1.0, or 0.0 to 1.0.
a) Aggregation
b) Binning
c) Clustering
d) Normalization
D
29. Normalization by ______ normalizes by moving the decimal point of values of attributes.
a) Z-Score
b) Z-Index
c) Decimal Scaling
d) Min-Max Normalization
Answer: C
30._______ is a top-down splitting technique based on a specified number of bins.
a) Normalization
b) Binning
c) Clustering
d) Classification
Answer: B
1. How many terms are required for building a bayes model?
a) 1
b) 2
c) 3
d) 4
Answer: c
2. What is needed to make probabilistic systems feasible in the world?
a) Reliability
b) Crucial robustness
c) Feasibility
d) None of the mentioned
Answer: b
3. Where does the bayes rule can be used?
a) Solving queries
b) Increasing complexity
c) Decreasing complexity
d) Answering probabilistic query
Answer: d
4. What does the bayesian network provides?
a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
d) None of the mentioned
Answer: a
5. How the entries in the full joint probability distribution can be calculated?
a) Using variables
b) Using information
c) Both Using variables & information
d) None of the mentioned
Answer: b
6. How the bayesian network can be used to answer any query?
a) Full distribution
b) Joint distribution
c) Partial distribution
d) All of the mentioned
Answer: b
7. How the compactness of the bayesian network can be described?
a) Locally structured
b) Fully structured
c) Partial structure
d) All of the mentioned
Answer: a
8. To which does the local structure is associated?
a) Hybrid
b) Dependant
c) Linear
d) None of the mentioned
Answer: c
9. Which condition is used to influence a variable directly by all the others?
a) Partially connected
b) Fully connected
c) Local connected
d) None of the mentioned
Answer: b
10. What is the consequence between a node and its predecessors while creating bayesian network?
a) Functionally dependent
b) Dependant
c) Conditionally independent
d) Both Conditionally dependant & Dependant
Answer: c
11. A _________ is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks
Answer: a
12. Decision Tree is a display of an algorithm.
a) True
b) False
Answer: a
13. What is Decision Tree?
a) Flow-Chart
b) Structure in which internal node represents test on an attribute, each branch represents outcome of test and each leaf node represents class label
c) Flow-Chart & Structure in which internal node represents test on an attribute, each branch represents outcome of test and each leaf node represents class label
d) None of the mentioned
Answer: c
14. Decision Trees can be used for Classification Tasks.
a) True
b) False
Answer: a
15. Choose from the following that are Decision Tree nodes?
a) Decision Nodes
b) End Nodes
c) Chance Nodes
d) All of the mentioned
Answer: d
16. Decision Nodes are represented by ____________
a) Disks
b) Squares
c) Circles
d) Triangles
Answer: b
17. Chance Nodes are represented by __________
a) Disks
b) Squares
c) Circles
d) Triangles
Answer: c
18. End Nodes are represented by __________
a) Disks
b) Squares
c) Circles
d) Triangles
Answer: d
19. Which of the following are the advantage/s of Decision Trees?
a) Possible Scenarios can be added
b) Use a white box model, If given result is provided by a model
c) Worst, best and expected values can be determined for different scenarios
d) All of the mentioned
Answer: d
20. Which of the following is the valid component of the predictor?
a) data
b) question
c) algorithm
d) all of the mentioned
Answer: d
21. Point out the wrong statement.
a) In Sample Error is also called generalization error
b) Out of Sample Error is the error rate you get on the new dataset
c) In Sample Error is also called resubstitution error
d) All of the mentioned
Answer: a
22. Which of the following is correct order of working?
a) questions->input data ->algorithms
b) questions->evaluation ->algorithms
c) evaluation->input data ->algorithms
d) all of the mentioned
Answer: a
23. Which of the following shows correct relative order of importance?
a) question->features->data->algorithms
b) question->data->features->algorithms
c) algorithms->data->features->question
d) none of the mentioned
Answer: b
24. Point out the correct statement.
a) In Sample Error is the error rate you get on the same dataset used to model a predictor
b) Data have two parts-signal and noise
c) The goal of predictor is to find signal
d) None of the mentioned
Answer: d
25. Which of the following is characteristic of best machine learning method?
a) Fast
b) Accuracy
c) Scalable
d) All of the mentioned
Answer: d
26. True positive means correctly rejected.
a) True
b) False
Answer: b
27. Which of the following trade-off occurs during prediction?
a) Speed vs Accuracy
b) Simplicity vs Accuracy
c) Scalability vs Accuracy
d) None of the mentioned
Answer: d
28. Which of the following expression is true?
a) In sample error < out sample error
b) In sample error > out sample error
c) In sample error = out sample error
d) All of the mentioned
Answer: a
29. Backtesting is a key component of effective trading-system development.
a) True
b) False
Answer: a
30. Which of the following is correct use of cross validation?
a) Selecting variables to include in a model
b) Comparing predictors
c) Selecting parameters in prediction function
d) All of the mentioned
Answer: d
31. Point out the wrong combination.
a) True negative=correctly rejected
b) False negative=correctly rejected
c) False positive=correctly identified
d) All of the mentioned
Answer: c
32. Which of the following is a common error measure?
a) Sensitivity
b) Median absolute deviation
c) Specificity
d) All of the mentioned
Answer: d
33. Which of the following is not a machine learning algorithm?
a) SVG
b) SVM
c) Random forest
d) None of the mentioned
Answer: a
34. Point out the wrong statement.
a) ROC curve stands for receiver operating characteristic
b) Foretime series, data must be in chunks
c) Random sampling must be done with replacement
d) None of the mentioned
Answer: d
35. Which of the following is a categorical outcome?
a) RMSE
b) RSquared
c) Accuracy
d) All of the mentioned
Answer: c
36. For k cross-validation, larger k value implies more bias.
a) True
b) False
Answer: b
37. Which of the following method is used for trainControl resampling?
a) repeatedcv
b) svm
c) bag32
d) none of the mentioned
Answer: a
38. Which of the following can be used to create the most common graph types?
a) qplot
b) quickplot
c) plot
d) all of the mentioned
Answer: a
39. For k cross-validation, smaller k value implies less variance.
a) True
b) False
Answer: a
40. Predicting with trees evaluate _____________ within each group of data.
a) equality
b) homogeneity
c) heterogeneity
d) all of the mentioned
Answer: b
41. Point out the wrong statement.
a) Training and testing data must be processed in different way
b) Test transformation would mostly be imperfect
c) The first goal is statistical and second is data compression in PCA
d) All of the mentioned
Answer: a
42. Which of the following method options is provided by train function for bagging?
a) bagEarth
b) treebag
c) bagFDA
d) all of the mentioned
Answer: d
43. Which of the following is correct with respect to random forest?
a) Random forest are difficult to interpret but often very accurate
b) Random forest are easy to interpret but often very accurate
c) Random forest are difficult to interpret but very less accurate
d) None of the mentioned
Answer: a
44. Point out the correct statement.
a) Prediction with regression is easy to implement
b) Prediction with regression is easy to interpret
c) Prediction with regression performs well when linear model is correct
d) All of the mentioned
Answer: d
45. Which of the following library is used for boosting generalized additive models?
a) gamBoost
b) gbm
c) ada
d) all of the mentioned
Answer: a
46. The principal components are equal to left singular values if you first scale the variables.
a) True
b) False
Answer: b
47. Which of the following is statistical boosting based on additive logistic regression?
a) gamBoost
b) gbm
c) ada
d) mboost
Answer: a
48. Which of the following is one of the largest boost subclass in boosting?
a) variance boosting
b) gradient boosting
c) mean boosting
d) all of the mentioned
Answer: b
49. PCA is most useful for non linear type models.
a) True
b) False
Answer: b
50. Which of the following clustering type has characteristic shown in the below figure?
a) Partitional
b) Hierarchical
c) Naive bayes
d) None of the mentioned
Answer: b
51. Point out the correct statement.
a) The choice of an appropriate metric will influence the shape of the clusters
b) Hierarchical clustering is also called HCA
c) In general, the merges and splits are determined in a greedy manner
d) All of the mentioned
Answer: d
52. Which of the following is finally produced by Hierarchical Clustering?
a) final estimate of cluster centroids
b) tree showing how close things are to each other
c) assignment of each point to clusters
d) all of the mentioned
Answer: b
53. Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned
Answer: d
54. Point out the wrong statement.
a) k-means clustering is a method of vector quantization
b) k-means clustering aims to partition n observations into k clusters
c) k-nearest neighbor is same as k-means
d) none of the mentioned
Answer: c
55. Which of the following combination is incorrect?
a) Continuous – euclidean distance
b) Continuous – correlation similarity
c) Binary – manhattan distance
d) None of the mentioned
Answer: d
56. Hierarchical clustering should be primarily used for exploration.
a) True
b) False
Answer: a
57. Which of the following function is used for k-means clustering?
a) k-means
b) k-mean
c) heatmap
d) none of the mentioned
Answer: a
58. Which of the following clustering requires merging approach?
a) Partitional
b) Hierarchical
c) Naive Bayes
d) None of the mentioned
Answer: b
59. K-means is not deterministic and it also consists of number of iterations.
a) True
b) False
Answer: a
60. Hierarchical clustering should be mainly used for exploration.
a) True
b) False
Answer: a
61. K-means clustering consists of a number of iterations and not deterministic.
a) True
b) False
Answer: a
62. Which is needed by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of these
Answer: d
63. Which function is used for k-means clustering?
a) k-means
b) k-mean
c) heatmap
d) none of the mentioned
Answer: a
64. Which is conclusively produced by Hierarchical Clustering?
a) final estimation of cluster centroids
b) tree showing how nearby things are to each other
c) assignment of each point to clusters
d) all of these
Answer: b
65. Which clustering technique requires a merging approach?
a) Partitional
b) Hierarchical
c) Naive Bayes
d) None of the mentioned
Answer: b
1. A collection of one or more items is called as _____
a) Itemset
b) Support
c) Confidence
d) Support Count
Answer: A
2. Frequency of occurrence of an itemset is called as _____
a) Support
b) Confidence
c) Support Count
d) Rules
Answer: C
3. An itemset whose support is greater than or equal to a minimum support threshold is ______
a) Itemset
b) Frequent Itemset
c) Infrequent items
d) Threshold values
Answer: B
4. What does FP growth algorithm do?
a) It mines all frequent patterns through pruning rules with lesser support
b) It mines all frequent patterns through pruning rules with higher support
c) It mines all frequent patterns by constructing a FP tree
d) It mines all frequent patterns by constructing an itemsets
Answer: C
5. What techniques can be used to improve the efficiency of apriori algorithm?
a) Hash-based techniques
b) Transaction Increases
c) Sampling
d) Cleaning
Answer: A
6. What do you mean by support(A)?
a) Total number of transactions containing A
b) Total Number of transactions not containing A
c) Number of transactions containing A / Total number of transactions
d) Number of transactions not containing A / Total number of transactions
Answer: C
7. How do you calculate Confidence (A -> B)?
a) Support(A #
# B) / Support (A)
b) Support(A #
# B) / Support (B)
c) Support(A #
# B) / Support (A)
d) Support(A #
# B) / Support (B)
Answer: A
8. Which of the following is the direct application of frequent itemset mining?
a) Social Network Analysis
b) Market Basket Analysis
c) Outlier Detection
d) Intrusion Detection
Answer: B
9. What is not true about FP growth algorithms?
a) It mines frequent itemsets without candidate generation
b) There are chances that FP trees may not fit in the memory
c) FP trees are very expensive to build
d) It expands the original database to build FP trees
Answer: D
10. When do you consider an association rule interesting?
a)If it only satisfies min_support
b) If it only satisfies min_confidence
c) If it satisfies both min_support and min_confidence
d) There are other measures to check so
Answer: C
11. What is the relation between a candidate and frequent itemsets?
a) A candidate itemset is always a frequent itemset
b) A frequent itemset must be a candidate itemset
c) No relation between these two
d) Strong relation with transactions
Answer:B
12. Which of the following is not a frequent pattern mining algorithm?
a) Apriori
b) FP growth
c) Decision trees
d) Eclat
Answer: C
13. Which algorithm requires fewer scans of data?
a)Apriori
b)FP Growth
c)Naive Bayes
d)Decision Trees
Answer: B
14. For the question given below consider the data Transactions :
I1, I2, I3, I4, I5, I6
I7, I2, I3, I4, I5, I6
I1, I8, I4, I5
I1, I9, I10, I4, I6
I10, I2, I4, I11, I5
With support as 0.6 find all frequent itemsets?
a) <I1>, <I2>, <I4>, <I5>, <I6>, <I1, I4>, <I2, I4>, <I2, I5>, <I4, I5>, <I4, I6>, <I2, I4, I5>
b) <I2>, <I4>, <I5>, <I2, I4>, <I2, I5>, <I4, I5>, <I2, I4, I5>
c) <I11>, <I4>, <I5>, <I6>, <I1, I4>, <I5, I4>, <I11, I5>, <I4, I6>, <I2, I4, I5>
d) <I1>, <I4>, <I5>, <I6>
Answer: A
15. What will happen if support is reduced?
a) Number of frequent itemsets remains the same
b) Some itemsets will add to the current set of frequent itemsets.
c) Some itemsets will become infrequent while others will become frequent
d) Can not say
Answer: B
16. What is association rule mining?
a) Same as frequent itemset mining
b) Finding of strong association rules using frequent itemsets
c) Using association to analyze correlation rules
d) Finding Itemsets for future trends
Answer: B
17. A definition or a concept is ______ if it classifies any examples as coming within the concept
a) Concurrent
b) Consistent
c) Constant
d) Compete
Answer: B
Prepare For Your Placements: https://lastmomenttuitions.com/courses/placement-preparation/
/ Youtube Channel: https://www.youtube.com/channel/UCGFNZxMqKLsqWERX_N2f08Q
Follow For Latest Updates, Study Tips & More Content!
Not a member yet? Register now
Are you a member? Login now