Get Latest Exam Updates, Free Study materials and Tips

Exit Intent

[MCQs] Big Data

Module 01

According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop?
(A) Big data management and data mining
(B) Data warehousing and business intelligence
(C) Management of Hadoop clusters
(D) Collecting and storing unstructured data
 Answer -A

2.What are the main components of Big Data?
(A) MapReduce
(B) HDFS
(C) YARN
(D) All of these
 Answer -D

3.What are the different features of Big Data Analytics?
(A) Open-Source
(B) Scalability
(C) Data Recovery
(D) All the above
  Answer -D

4.According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop?
(A) Big data management and data mining
(B) Data warehousing and business intelligence
(C) Management of Hadoop clusters
(D) Collecting and storing unstructured data
  Answer -A

5.What are the four V’s of Big Data?
(A) Volume
(B) Velocity
(C) Variety
(D) All the above
  Answer    -D

 IBM and ________ have announced a major initiative to use Hadoop to support university courses in distributed computer programming.
a) Google Latitude
b) Android (operating system)
c) Google Variations
d) Google
Answer: d
Explanation: Google and IBM Announce University Initiative to Address Internet-Scale.

 Point out the correct statement.
a) Hadoop is an ideal environment for extracting and transforming small volumes of data
b) Hadoop stores data in HDFS and supports data compression/decompression
c) The Giraph framework is less useful than a MapReduce job to solve graph and machine learning
d) None of the mentioned

Answer: b

Explanation: Data compression can be achieved using compression algorithms like bzip2, gzip, LZO, etc. Different algorithms can be used in different scenarios based on their capabilities.

 What license is Hadoop distributed under?
a) Apache License 2.0
b) Mozilla Public License
c) Shareware
d) Commercial
Answer: a
Explanation: Hadoop is Open Source, released under Apache 2 license.




 Sun also has the Hadoop Live CD ________ project, which allows running a fully functional Hadoop cluster using a live CD.
a) OpenOffice.org
b) OpenSolaris
c) GNU
d) Linux
Answer: b
Explanation: The OpenSolaris Hadoop LiveCD project built a bootable CD-ROM image.

 Which of the following genres does Hadoop produce?
a) Distributed file system
b) JAX-RS
c) Java Message Service
d) Relational Database Management System
Answer: a
Explanation: The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to the user.

 What was Hadoop written in?
a) Java (software platform)
b) Perl
c) Java (programming language)
d) Lua (programming language)
Answer: c
Explanation: The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command-line utilities written as shell-scripts.

 Which of the following platforms does Hadoop run on?
a) Bare metal
b) Debian
c) Cross-platform
d) Unix-like
Answer: c
Explanation: Hadoop has support for cross-platform operating system.

 Hadoop achieves reliability by replicating the data across multiple hosts and hence does not require ________ storage on hosts.
a) RAID
b) Standard RAID levels
c) ZFS
d) Operating system
Answer: a
Explanation: With the default replication value, 3, data is stored on three nodes: two on the same rack, and one on a different rack.

 Above the file systems comes the ________ engine, which consists of one Job Tracker, to which client applications submit MapReduce jobs.
a) MapReduce
b) Google
c) Functional programming
d) Facebook
Answer: a
Explanation: MapReduce engine uses to distribute work around a cluster.

 The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations.
a) Machine learning
b) Pattern recognition
c) Statistical classification
d) Artificial intelligence
Answer: a
Explanation: The Apache Mahout project’s goal is to build a scalable machine learning tool.

 As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including _______________
a) Improved data storage and information retrieval
b) Improved extract, transform and load features for data integration
c) Improved data warehousing functionality
d) Improved security, workload management, and SQL support
Answer: d
Explanation: Adding security to Hadoop is challenging because all the interactions do not follow the classic client-server pattern.




 Point out the correct statement.
a) Hadoop do need specialized hardware to process the data
b) Hadoop 2.0 allows live stream processing of real-time data
c) In Hadoop programming framework output files are divided into lines or records
d) None of the mentioned
Answer: b
Explanation: Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s.

 According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop?
a) Big data management and data mining
b) Data warehousing and business intelligence
c) Management of Hadoop clusters
d) Collecting and storing unstructured data
Answer: a
Explanation: Data warehousing integrated with Hadoop would give a better understanding of data.

 Hadoop is a framework that works with a variety of related tools. Common cohorts include ____________
a) MapReduce, Hive and HBase
b) MapReduce, MySQL and Google Apps
c) MapReduce, Hummer and Iguana
d) MapReduce, Heron and Trumpet
Answer: a
Explanation: To use Hive with HBase you’ll typically want to launch two clusters, one to run HBase and the other to run Hive.

 Point out the wrong statement.
a) Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data
b) Hadoop uses a programming model called “MapReduce”, all the programs should confirm to this model in order to work on Hadoop platform
c) The programming model, MapReduce, used by Hadoop is difficult to write and test
d) All of the mentioned
Answer: c
Explanation: The programming model, MapReduce, used by Hadoop is simple to write and test.

 What was Hadoop named after?
a) Creator Doug Cutting’s favorite circus act
b) Cutting’s high school rock band
c) The toy elephant of Cutting’s son
d) A sound Cutting’s laptop made during Hadoop development
Answer: c
Explanation: Doug Cutting, Hadoop creator, named the framework after his child’s stuffed toy elephant.

 All of the following accurately describe Hadoop, EXCEPT ____________
a) Open-source
b) Real-time
c) Java-based
d) Distributed computing approach
Answer: b
Explanation: Apache Hadoop is an open-source software framework for distributed storage and distributed processing of Big Data on clusters of commodity hardware.

 __________ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data.
a) MapReduce
b) Mahout
c) Oozie
d) All of the mentioned
Answer: a
Explanation: MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm.

 __________ has the world’s largest Hadoop cluster.
a) Apple
b) Datamatics
c) Facebook
d) None of the mentioned
Answer: c
Explanation: Facebook has many Hadoop clusters, the largest among them is the one that is used for Data warehousing.




 Facebook Tackles Big Data With _______ based on Hadoop.
a) ‘Project Prism’
b) ‘Prism’
c) ‘Project Big’
d) ‘Project Data’
Answer: a
Explanation: Prism automatically replicates and moves data wherever it’s needed across a vast network of computing facilities.

 ________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets.
a) Pig Latin
b) Oozie
c) Pig
d) Hive
Answer: c
Explanation: Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs.

 Point out the correct statement.
a) Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data
b) Hive is a relational database with SQL support
c) Pig is a relational database with SQL support
d) All of the mentioned
Answer: a
Explanation: Hive is a SQL-based data warehouse system for Hadoop that facilitates data summarization, ad hoc queries, and the analysis of large datasets stored in Hadoop-compatible file systems.

 _________ hides the limitations of Java behind a powerful and concise Clojure API for Cascading.
a) Scalding
b) HCatalog
c) Cascalog
d) All of the mentioned
Answer: c
Explanation: Cascalog also adds Logic Programming concepts inspired by Datalog. Hence the name “Cascalog” is a contraction of Cascading and Datalog.

 Hive also support custom extensions written in ____________
a) C#
b) Java
c) C
d) C++
Answer: b
Explanation: Hive also support custom extensions written in Java, including user-defined functions (UDFs) and serialize
r-deserializers for reading and optionally writing custom formats.

 Point out the wrong statement.
a) Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering
b) Amazon Web Service Elastic MapReduce (EMR) is Amazon’s packaged Hadoop offering
c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate
d) All of the mentioned
Answer: a
Explanation: Rather than building Hadoop deployments manually on EC2 (Elastic Compute Cloud) clusters, users can spin up fully configured Hadoop installations using simple invocation commands, either through the AWS Web Console or through command-line tools.

 ________ is the most popular high-level Java API in Hadoop Ecosystem
a) Scalding
b) HCatalog
c) Cascalog
d) Cascading
Answer: d
Explanation: Cascading hides many of the complexities of MapReduce programming behind more intuitive pipes and data flow abstractions.

 ___________ is general-purpose computing model and runtime system for distributed data analytics.
a) Mapreduce
b) Drill
c) Oozie
d) None of the mentioned
Answer: a
Explanation: Mapreduce provides a flexible and scalable foundation for analytics, from traditional reporting to leading-edge machine learning algorithms.

 The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to ____________
a) SQL
b) JSON
c) XML
d) All of the mentioned
Answer: a
Explanation: Pig Latin, in essence, is designed to fill the gap between the declarative style of SQL and the low-level procedural style of MapReduce.

 _______ jobs are optimized for scalability but not latency.
a) Mapreduce
b) Drill
c) Oozie
d) Hive
Answer: d
Explanation: Hive Queries are translated to MapReduce jobs to exploit the scalability of MapReduce.

 ______ is a framework for performing remote procedure calls and data serialization.
a) Drill
b) BigTop
c) Avro
d) Chukwa
Answer: c
Explanation: In the context of Hadoop, Avro can be used to pass data from one program or language to another.

Module 02

1.A ________ serves as the master and there is only one NameNode per cluster.
a) Data Node
b) NameNode
c) Data block
d) Replication
Answer: b
Explanation: All the metadata related to HDFS including the information about data nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.

2.Point out the correct statement.
a) DataNode is the slave/worker node and holds the user data in the form of Data Blocks
b) Each incoming file is broken into 32 MB by default
c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance
d) None of the mentioned
Answer: a
Explanation: There can be any number of DataNodes in a Hadoop Cluster.

3.HDFS works in a __________ fashion.
a) master-worker
b) master-slave
c) worker/slave
d) all of the mentioned
Answer: a
Explanation: NameNode servers as the master and each DataNode servers as a worker/slave

________ NameNode is used when the Primary NameNode goes down.
a) Rack
b) Data
c) Secondary
d) None of the mentioned
Answer: c
Explanation: Secondary namenode is used for all time availability and reliability.

Point out the wrong statement.
a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level
b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode
c) User data is stored on the local file system of DataNodes
d) DataNode is aware of the files to which the blocks stored on it belong to
Answer: d
Explanation: NameNode is aware of the files to which the blocks stored on it belong to.

Which of the following scenario may not be a good fit for HDFS?
a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
b) HDFS is suitable for storing data related to applications requiring low latency data access
c) HDFS is suitable for storing data related to applications requiring low latency data access
d) None of the mentioned
Answer: a
Explanation: HDFS can be used for storing archive data since it is cheaper as HDFS allows storing the data on low cost commodity hardware while ensuring a high degree of fault-tolerance.


The need for data replication can arise in various scenarios like ____________
a) Replication Factor is changed
b) DataNode goes down
c) Data Blocks get corrupted
d) All of the mentioned
Answer: d
Explanation: Data is replicated across different DataNodes to ensure a high degree of fault-tolerance.

________ is the slave/worker node and holds the user data in the form of Data Blocks.
a) DataNode
b) NameNode
c) Data block
d) Replication
Answer: a
Explanation: A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNode, with data replicated across them.

HDFS provides a command line interface called __________ used to interact with HDFS.
a) “HDFS Shell”
b) “FS Shell”
c) “DFS Shell”
d) None of the mentioned
Answer: b
Explanation: The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS).

HDFS is implemented in _____________ programming language.
a) C++
b) Java
c) Scala
d) None of the mentioned
Answer: b
Explanation: HDFS is implemented in Java and any computer which can run Java can host a NameNode/DataNode on it.

For YARN, the ___________ Manager UI provides host and port information.
a) Data Node
b) NameNode
c) Resource
d) Replication
Answer: c
Explanation: All the metadata related to HDFS including the information about data nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.

Point out the correct statement.
a) The Hadoop framework publishes the job flow status to an internally running web server on the master nodes of the Hadoop cluster
b) Each incoming file is broken into 32 MB by default
c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance
d) None of the mentioned
Answer: a
Explanation: The web interface for the Hadoop Distributed File System (HDFS) shows information about the NameNode itself.

For ________ the HBase Master UI provides information about the HBase Master uptime.
a) HBase
b) Oozie
c) Kafka
d) All of the mentioned
Answer: a
Explanation: HBase Master UI provides information about the num­ber of live, dead and transitional servers, logs, ZooKeeper information, debug dumps, and thread stacks.

During start up, the ___________ loads the file system state from the fsimage and the edits log file.
a) DataNode
b) NameNode
c) ActionNode
d) None of the mentioned
Answer: b
Explanation: HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it

A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker.
a) MapReduce
b) Mapper
c) TaskTracker
d) JobTracker
Answer: c
Explanation: TaskTracker receives the information necessary for the execution of a Task from JobTracker, Executes the Task, and Sends the Results back to JobTracker.


Point out the correct statement.
a) MapReduce tries to place the data and the compute as close as possible
b) Map Task in MapReduce is performed using the Mapper() function
c) Reduce Task in MapReduce is performed using the Map() function
d) All of the mentioned
Answer: a
Explanation: This feature of MapReduce is “Data Locality”.

___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results.
a) Maptask
b) Mapper
c) Task execution
d) All of the mentioned
Answer: a
Explanation: Map Task in MapReduce is performed using the Map() function.

_________ function is responsible for consolidating the results produced by each of the Map() functions/tasks.
a) Reduce
b) Map
c) Reducer
d) All of the mentioned
Answer: a
Explanation: Reduce function collates the work and resolves the results.

Point out the wrong statement.
a) A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner
b) The MapReduce framework operates exclusively on <key, value> pairs
c) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods
d) None of the mentioned
Answer: d
Explanation: The MapReduce framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in ____________
a) Java
b) C
c) C#
d) None of the mentioned
Answer: a
Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM based).

________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer.
a) Hadoop Strdata
b) Hadoop Streaming
c) Hadoop Stream
d) None of the mentioned
Answer: b
Explanation: Hadoop streaming is one of the most important utilities in the Apache Hadoop distribution.

__________ maps input key/value pairs to a set of intermediate key/value pairs.
a) Mapper
b) Reducer
c) Both Mapper and Reducer
d) None of the mentioned
Answer: a
Explanation: Maps are the individual tasks that transform input records into intermediate records.

The number of maps is usually driven by the total size of ____________
a) inputs
b) outputs
c) tasks
d) None of the mentioned
Answer: a
Explanation: Total size of inputs means the total number of blocks of the input files.

_________ is the default Partitioner for partitioning key space.
a) HashPar
b) Partitioner
c) HashPartitioner
d) None of the mentioned
Answer: c
Explanation: The default partitioner in Hadoop is the HashPartitioner which has a method called getPartition to partition.

Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster.
a) MapReduce
b) Map
c) Reducer
d) All of the mentioned
Answer: a
Explanation: In some applications, component tasks need to create and/or write to side-files, which differ from the actual job-output files.

Module 03

1.Following represent column in NoSQL __________.
(A) Database
(B) Field
(C) Document
(D) Collection
Answer -B

2.What is the aim of NoSQL?
(A) NoSQL provides an alternative to SQL databases to store textual data.
(B) NoSQL databases allow storing non-structured data.
(C) NoSQL is not suitable for storing structured data.
(D) NoSQL is a new data format to store large datasets.
Answer- D

3.__________ is a online NoSQL developed by Cloudera.
(A) HCatalog
(B) Hbase
(C) Imphala
(D) Oozie
Answer-B

4.Which of the following is not a NoSQL database?
(A) SQL Server
(B) MongoDB
(C) Cassandra
(D) None of the mentioned
Answer-A

5.Which of the following is a NoSQL Database Type?
(A) SQL
(B) Document databases
(C) JSON
(D) All of the mentioned
Answer-B

6.Following represent column in NoSQL __________.
(A) Database
(B) Field
(C) Document
(D) Collection
Answer-B

7.What is the aim of NoSQL?
(A) NoSQL provides an alternative to SQL databases to store textual data.
(B) NoSQL databases allow storing non-structured data.
(C) NoSQL is not suitable for storing structured data.
(D) NoSQL is a new data format to store large datasets.
Answer-D

8.__________ is a online NoSQL developed by Cloudera.
(A) HCatalog
(B) Hbase
(C) Imphala
(D) Oozie
Answer-B

9.Which of the following is not a NoSQL database?
(A) SQL Server
(B) MongoDB
(C) Cassandra
(D) None of the mentioned
Answer-A

10.Which of the following is a NoSQL Database Type?
(A) SQL
(B) Document databases
(C) JSON
(D) All of the mentioned
Answer-B




11.Following represent column in NoSQL __________.
(A) Database
(B) Field
(C) Document
(D) Collection
Answer-B

12.What is the aim of NoSQL?
(A) NoSQL provides an alternative to SQL databases to store textual data.
(B) NoSQL databases allow storing non-structured data.
(C) NoSQL is not suitable for storing structured data.
(D) NoSQL is a new data format to store large datasets.
Answer-D

13.__________ is a online NoSQL developed by Cloudera.
(A) HCatalog
(B) Hbase
(C) Imphala
(D) Oozie
Answer-B

14.Which of the following is not a NoSQL database?
(A) SQL Server
(B) MongoDB
(C) Cassandra
(D) None of the mentioned
Answer-A

15.Which of the following is a NoSQL Database Type?
(A) SQL
(B) Document databases
(C) JSON
(D) All of the mentioned
Answer-B

16.Following represent column in NoSQL __________.
(A) Database
(B) Field
(C) Document
(D) Collection
Answer-B

17.What is the aim of NoSQL?
(A) NoSQL provides an alternative to SQL databases to store textual data.
(B) NoSQL databases allow storing non-structured data.
(C) NoSQL is not suitable for storing structured data.
(D) NoSQL is a new data format to store large datasets.
Answer-D

18.__________ is a online NoSQL developed by Cloudera.
(A) HCatalog
(B) Hbase
(C) Imphala
(D) Oozie
Answer-B

19.Which of the following is not a NoSQL database?
(A) SQL Server
(B) MongoDB
(C) Cassandra
(D) None of the mentioned
Answer-A

20.Which of the following is a NoSQL Database Type?
(A) SQL
(B) Document databases
(C) JSON
(D) All of the mentioned
Answer-B




21.Following represent column in NoSQL __________.
(A) Database
(B) Field
(C) Document
(D) Collection
Answer-B

22.What is the aim of NoSQL?
(A) NoSQL provides an alternative to SQL databases to store textual data.
(B) NoSQL databases allow storing non-structured data.
(C) NoSQL is not suitable for storing structured data.
(D) NoSQL is a new data format to store large datasets.
Answer-D

23.__________ is a online NoSQL developed by Cloudera.
(A) HCatalog
(B) Hbase
(C) Imphala
(D) Oozie
Answer-B

24.Which of the following is not a NoSQL database?
(A) SQL Server
(B) MongoDB
(C) Cassandra
(D) None of the mentioned
Answer-A

25.Which of the following is a NoSQL Database Type?
(A) SQL
(B) Document databases
(C) JSON
(D) All of the mentioned
Answer-B

Module 4

1.Bloom filter was proposed by :  
 Burton morris Bloom
b.Burton Howard Bloom
c.Burton Datar Bloom
d.Burton Howrd Bloom
Answer : Burton morris Bloom

2. A simple space-efficient randomized data structure for representing a set in order to support membership queries
a.Bloom Filter
b.Flajolet Martin
c.DGIM
 K-means
Answer: a. Bloom Filter

3.It is a web-based financial search engine that evaluates queries over real-time streaming financial data such as stock tickers and news feeds
a.Traderbot
b.Tradebot
c.Clickbot
d.Hyperbot
Answer : a.Traderbot

 3.If the stream contains n elements with m of them unique, the FM algorithm needs a  memory of .
a.O((m))
b.O(log(m+1))
c.O(log(m+2))
d.O(log(m))
Answer : d.O(log(m))

 4.Calculate h(3) , given   S=1,3,2,1,2,3,4,3,1,2,3,1 and h(x)=(6x+1) mod 5
a.19
b.10
c.15
D.16
Answer : a.19

 5.According to Bloom filter principle, we should consider the potential effects of:
  a.true positives
  b.false negatives
  c.false positives
  d.true negatives
Answer :   c.false positives

 6.Who released a hash function named MurmurHash in 2008: 
a.Datar Motwani
b.Austin Appleby
c.Marrianne Durrand
d.Burton Datar Bloom
Answer : b.Austin Appleby

8.The files on disks or records in databases need to be stored in Bloom filter as 
a.keys
b.values
c.key-values
d.columns
Answer : c.key-values

9. If the stream contains n elements with m of them unique, the FM algorithm runs in ------------------ time
a.O(sq.rt(n))
b.O(n+2)
c.O(n+1)
d.O(n)
Answer : d.O(n)

10.Given h(x) = x + 6 mod 32 , The binary value of h(4) : 
a.1011
b.1010
c.1110
d.1111
Answer : b.1010




 11.Flajolet-Martin algorithm approximates the number of unique objects in a stream or a database in how many passes? 
a.n
b.0
c.1
d.2
Answer : c.1

 12.What is important when the input rate of facebook data is controlled externally:
a.facebook Management
b.Query Management
c.Stream Management
d.data Management
Answer: b.Query Management

 13.Which algorithmn solution does not assume uniformity?
a.DGIM
b.FM
c.SON
d.K-MEANS
Answer: a.DGIM

14. Which query operator is unable to produce an answer until it has seen its entire input: 
a.Blocking query operator
b.Discrete operator
c.continuous operator
d.Continuous  operator and discrete queries

15. 000101 has tail length  of ------- : 
 1
 2
 3
 0
Answer: d.  0

16.In FM algorithm, The probability that a given h(a) ends in at least i 0’s is 
 1
 0
 2^-i
 i
Answer : c.  2^-i

17. Probability of a false positive in Bloom Filters  depends on 
a.the number of hash functions
b.the density of 1’s in the array 
c.the number of hash functions and the density of 1’s in the array
d.the density of 0’s in the array
Answer : c.the number of hash functions and the density of 1’s in the array

18. It is an array of bits, together with a number of hash functions
a.Bloom filter
 Hash Function
c.Data Stream 
d.Binary input
Answer : a.Bloom filter

19. ______________query is one that is supplied to the Dsms before any relevant data arrived
a.Continuous queries and discrete queries
b.discrete queries
c.ad-hoc
d.pre-defined
Answer : d.pre-defined




20. Sorting used for query processing  is an example of : 
a.Blocking query operator
b.Blocking discrete operator
c.Blocking Continuous operator
d.Continuous operator
Answer : a.Blocking query operator

Module 05

1.PCY Stands for
a.Park-Chen-Yu
b.Park-Chen-You
c.Park-Check-Yu
d.Park-Check-You
Answer :a.Park-Chen-Yu

2. SON Algorithm Stands for
a.Shane,Omiecinski and Navathe
b.Savasere,Omiecinski and Navathe
c.Savare,Omienal and Navathe
d.Savasere,Omiecinski and Navarag
Answer : b.Savasere,Omiecinski and Navathe

3. Minimum Support=?,if total Transaction =5 and minimum Support=60%
a.30
b.3
c.300
d.65
Answer : b.3

4.Minimum Support=?,if total Transaction =10 and minimum Support=60%
a.6
b.0.6
c.10
d.5
Answer : a.6

5. How do you calculate Confidence(B -> A)?
a.Support(A B) / Support (A)
b.Support(A B) / Support (B)
c.Support(A ) / Support (B)
d.Support( B) / Support (A)
Answer : b.Support(A B) / Support (B)

Module 06

1.Which of the following is true?
a.graph may contain no edges and many vertices
b.graph may contain many edges and atleast one vertices
c.graph may contain no edges and no vertices
d.graph may contain no vertices and many edges
Answer : b. graph may contain many edges and atleast one vertices

2.Social Network is defined as
a.Collection of entities that participate in the network.
b.Collection of items in store
c.Collection of vertices & edges in a graph
d.Collection of nodes in a graph
Answer : a.Collection of entities that participate in the network.

3.Which of the following is finally produced by Hierarchical Clustering?
a.final estimate of cluster centroids
b.tree showing how close things are to each other
c.Assignment of each point to clusters
d.Assignment of each edges of clusters
Answer : b.tree showing how close things are to each other

4. Which of the following clustering requires merging approach?
a.Partitional
b.Hierarchical
c.Naive Bayes
d.K-means
Answer : b.Hierarchical

5.Which of the following function is used for k-means clustering?
a.K-means
b.Euclidean Distance
c.Heatmap
d.Correlation Similarity
Answer: a.K-means

6.___________ was the pioneer in the field of web search with the use of PageRank for ranking Web pages with respect to a user query.
a.Yahoo
b.YouTube
c.Facebook
d.Google 
Answer : d. Google 

7.Which of the following algorithm is used by Google to determine the importance of a particular page?
a.SVD
b.PageRank
c.FastMap
d.All of the above
Answer : b.PageRank

8.One of the popular techniques of Spamdexing is ___________
a.Clocking
b.Cooking
c.Cloaking
d.Crocking
Answer:  c.Cloaking

9.Doorway pages are_________ Web pages.
a.High quality
b.Low quality
c.Informative
d.High content
Answer : b.Low quality

10.PageRank helps in measuring ________________ of a Web page within a set of similar entries.
a.Relative importance
b.Size
c.Cost
d.All of the above
Answer : a.Relative importance

11.PageRank helps in measuring ________________ of a Web page within a set of similar entries.
a.Relative importance
b.Size
c.Cost
d.All of the above
Answer : a.Relative importance

12.Web pages with Dead ends means__________
a.Pages with no outlinks
b.Pages with no PageRank
c.Pages with no contents
d.Pages with spam
Answer : a.Pages with no outlinks

13.Topic Sensitive PageRank (TSPR) is proposed by_________ in 2003.
a.Al-Saffar
b.Bratislav V. Stojanović
c.Jianshu WENG
d.Taher H. Haveliwala 
Answer : d. Taher H. Haveliwala 

14.Full form of HITS is _____________
a.High Influential Topic Search
b.High Informative Topic Search
c.Hyperlink-induced topic Search
d.None of the above
Answer: c.Hyperlink-induced topic Search


15.HITS algorithm and the PageRank algorithm both make use of the _________to decide the relevance of the pages.
a.Link structure of the Web graph
b.Design of the Web graph
c.Content of the web pages
d.All of the above
Answer : a.Link structure of the Web graph

16.When the objective is to mine social network for patterns, a natural way to represent a social network is by a___________
a.Tree
b.Graph
c.Arrays
d.Lists
Answer : b.Graph

17.A social network can be considered as a___________
a.Heterogeneous and multi relational dataset
b.LiveJournal
c.Twitter
d.DBLP
Answer : a.Heterogeneous and multi relational dataset

18.For an edge ‘e’ in a graph, ___________of ‘e’ is defined as the number of shortest paths between all node pairs (vi vj) in the graph such that the shortest path passes through ‘e’.
a.Edge path
b.Edge measure  
c.Edge closeness
d.Edge betweenness
Answer : d.Edge betweenness

19.“You may also like these…”, “People who liked this also liked….”,   this type of suggestions are from the______________
a.Filtering System
b.Collaborative System
c.Recommendation System
d.Amazon System
Answer: c.Recommendation System

20.An approach to a Recommendation system is to treat this as the _______________ problem using items profiles and utility matrices.
a.MapReduce
b.Social Network
c.Machine learning
d.Unstructured 
Answer : c.Machine learning

Prepare For Your Placements: https://lastmomenttuitions.com/courses/placement-preparation/

/ Youtube Channel: https://www.youtube.com/channel/UCGFNZxMqKLsqWERX_N2f08Q

Follow For Latest Updates, Study Tips & More Content!

/lastmomenttuition

/ Last Moment Tuitions

/ lastmomentdost