Get Latest Exam Updates, Free Study materials and Tips
According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop? (A) Big data management and data mining (B) Data warehousing and business intelligence (C) Management of Hadoop clusters (D) Collecting and storing unstructured data Answer -A 2.What are the main components of Big Data? (A) MapReduce (B) HDFS (C) YARN (D) All of these Answer -D 3.What are the different features of Big Data Analytics? (A) Open-Source (B) Scalability (C) Data Recovery (D) All the above Answer -D 4.According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop? (A) Big data management and data mining (B) Data warehousing and business intelligence (C) Management of Hadoop clusters (D) Collecting and storing unstructured data Answer -A 5.What are the four V’s of Big Data? (A) Volume (B) Velocity (C) Variety (D) All the above Answer -D IBM and ________ have announced a major initiative to use Hadoop to support university courses in distributed computer programming. a) Google Latitude b) Android (operating system) c) Google Variations d) Google Answer: d Explanation: Google and IBM Announce University Initiative to Address Internet-Scale. Point out the correct statement. a) Hadoop is an ideal environment for extracting and transforming small volumes of data b) Hadoop stores data in HDFS and supports data compression/decompression c) The Giraph framework is less useful than a MapReduce job to solve graph and machine learning d) None of the mentioned Answer: b Explanation: Data compression can be achieved using compression algorithms like bzip2, gzip, LZO, etc. Different algorithms can be used in different scenarios based on their capabilities. What license is Hadoop distributed under? a) Apache License 2.0 b) Mozilla Public License c) Shareware d) Commercial Answer: a Explanation: Hadoop is Open Source, released under Apache 2 license.Sun also has the Hadoop Live CD ________ project, which allows running a fully functional Hadoop cluster using a live CD. a) OpenOffice.org b) OpenSolaris c) GNU d) Linux Answer: b Explanation: The OpenSolaris Hadoop LiveCD project built a bootable CD-ROM image. Which of the following genres does Hadoop produce? a) Distributed file system b) JAX-RS c) Java Message Service d) Relational Database Management System Answer: a Explanation: The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to the user. What was Hadoop written in? a) Java (software platform) b) Perl c) Java (programming language) d) Lua (programming language) Answer: c Explanation: The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command-line utilities written as shell-scripts. Which of the following platforms does Hadoop run on? a) Bare metal b) Debian c) Cross-platform d) Unix-like Answer: c Explanation: Hadoop has support for cross-platform operating system. Hadoop achieves reliability by replicating the data across multiple hosts and hence does not require ________ storage on hosts. a) RAID b) Standard RAID levels c) ZFS d) Operating system Answer: a Explanation: With the default replication value, 3, data is stored on three nodes: two on the same rack, and one on a different rack. Above the file systems comes the ________ engine, which consists of one Job Tracker, to which client applications submit MapReduce jobs. a) MapReduce b) Google c) Functional programming d) Facebook Answer: a Explanation: MapReduce engine uses to distribute work around a cluster. The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations. a) Machine learning b) Pattern recognition c) Statistical classification d) Artificial intelligence Answer: a Explanation: The Apache Mahout project’s goal is to build a scalable machine learning tool. As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including _______________ a) Improved data storage and information retrieval b) Improved extract, transform and load features for data integration c) Improved data warehousing functionality d) Improved security, workload management, and SQL support Answer: d Explanation: Adding security to Hadoop is challenging because all the interactions do not follow the classic client-server pattern.Learn Big Data Analytics from ScratchPoint out the correct statement. a) Hadoop do need specialized hardware to process the data b) Hadoop 2.0 allows live stream processing of real-time data c) In Hadoop programming framework output files are divided into lines or records d) None of the mentioned Answer: b Explanation: Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s. According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop? a) Big data management and data mining b) Data warehousing and business intelligence c) Management of Hadoop clusters d) Collecting and storing unstructured data Answer: a Explanation: Data warehousing integrated with Hadoop would give a better understanding of data. Hadoop is a framework that works with a variety of related tools. Common cohorts include ____________ a) MapReduce, Hive and HBase b) MapReduce, MySQL and Google Apps c) MapReduce, Hummer and Iguana d) MapReduce, Heron and Trumpet Answer: a Explanation: To use Hive with HBase you’ll typically want to launch two clusters, one to run HBase and the other to run Hive. Point out the wrong statement. a) Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data b) Hadoop uses a programming model called “MapReduce”, all the programs should confirm to this model in order to work on Hadoop platform c) The programming model, MapReduce, used by Hadoop is difficult to write and test d) All of the mentioned Answer: c Explanation: The programming model, MapReduce, used by Hadoop is simple to write and test. What was Hadoop named after? a) Creator Doug Cutting’s favorite circus act b) Cutting’s high school rock band c) The toy elephant of Cutting’s son d) A sound Cutting’s laptop made during Hadoop development Answer: c Explanation: Doug Cutting, Hadoop creator, named the framework after his child’s stuffed toy elephant. All of the following accurately describe Hadoop, EXCEPT ____________ a) Open-source b) Real-time c) Java-based d) Distributed computing approach Answer: b Explanation: Apache Hadoop is an open-source software framework for distributed storage and distributed processing of Big Data on clusters of commodity hardware. __________ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data. a) MapReduce b) Mahout c) Oozie d) All of the mentioned Answer: a Explanation: MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm. __________ has the world’s largest Hadoop cluster. a) Apple b) Datamatics c) Facebook d) None of the mentioned Answer: c Explanation: Facebook has many Hadoop clusters, the largest among them is the one that is used for Data warehousing.Learn Big Data Analytics from ScratchFacebook Tackles Big Data With _______ based on Hadoop. a) ‘Project Prism’ b) ‘Prism’ c) ‘Project Big’ d) ‘Project Data’ Answer: a Explanation: Prism automatically replicates and moves data wherever it’s needed across a vast network of computing facilities. ________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets. a) Pig Latin b) Oozie c) Pig d) Hive Answer: c Explanation: Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs. Point out the correct statement. a) Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data b) Hive is a relational database with SQL support c) Pig is a relational database with SQL support d) All of the mentioned Answer: a Explanation: Hive is a SQL-based data warehouse system for Hadoop that facilitates data summarization, ad hoc queries, and the analysis of large datasets stored in Hadoop-compatible file systems. _________ hides the limitations of Java behind a powerful and concise Clojure API for Cascading. a) Scalding b) HCatalog c) Cascalog d) All of the mentioned Answer: c Explanation: Cascalog also adds Logic Programming concepts inspired by Datalog. Hence the name “Cascalog” is a contraction of Cascading and Datalog. Hive also support custom extensions written in ____________ a) C# b) Java c) C d) C++ Answer: b Explanation: Hive also support custom extensions written in Java, including user-defined functions (UDFs) and serialize r-deserializers for reading and optionally writing custom formats. Point out the wrong statement. a) Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering b) Amazon Web Service Elastic MapReduce (EMR) is Amazon’s packaged Hadoop offering c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate d) All of the mentioned Answer: a Explanation: Rather than building Hadoop deployments manually on EC2 (Elastic Compute Cloud) clusters, users can spin up fully configured Hadoop installations using simple invocation commands, either through the AWS Web Console or through command-line tools. ________ is the most popular high-level Java API in Hadoop Ecosystem a) Scalding b) HCatalog c) Cascalog d) Cascading Answer: d Explanation: Cascading hides many of the complexities of MapReduce programming behind more intuitive pipes and data flow abstractions. ___________ is general-purpose computing model and runtime system for distributed data analytics. a) Mapreduce b) Drill c) Oozie d) None of the mentioned Answer: a Explanation: Mapreduce provides a flexible and scalable foundation for analytics, from traditional reporting to leading-edge machine learning algorithms. The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to ____________ a) SQL b) JSON c) XML d) All of the mentioned Answer: a Explanation: Pig Latin, in essence, is designed to fill the gap between the declarative style of SQL and the low-level procedural style of MapReduce. _______ jobs are optimized for scalability but not latency. a) Mapreduce b) Drill c) Oozie d) Hive Answer: d Explanation: Hive Queries are translated to MapReduce jobs to exploit the scalability of MapReduce. ______ is a framework for performing remote procedure calls and data serialization. a) Drill b) BigTop c) Avro d) Chukwa Answer: c Explanation: In the context of Hadoop, Avro can be used to pass data from one program or language to another.Learn Big Data Analytics from Scratch
1.A ________ serves as the master and there is only one NameNode per cluster.
a) Data Node
b) NameNode
c) Data block
d) Replication
Answer: b
Explanation: All the metadata related to HDFS including the information about data nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.
2.Point out the correct statement.
a) DataNode is the slave/worker node and holds the user data in the form of Data Blocks
b) Each incoming file is broken into 32 MB by default
c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance
d) None of the mentioned
Answer: a
Explanation: There can be any number of DataNodes in a Hadoop Cluster.
3.HDFS works in a __________ fashion.
a) master-worker
b) master-slave
c) worker/slave
d) all of the mentioned
Answer: a
Explanation: NameNode servers as the master and each DataNode servers as a worker/slave
________ NameNode is used when the Primary NameNode goes down.
a) Rack
b) Data
c) Secondary
d) None of the mentioned
Answer: c
Explanation: Secondary namenode is used for all time availability and reliability.
Point out the wrong statement.
a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level
b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode
c) User data is stored on the local file system of DataNodes
d) DataNode is aware of the files to which the blocks stored on it belong to
Answer: d
Explanation: NameNode is aware of the files to which the blocks stored on it belong to.
Which of the following scenario may not be a good fit for HDFS?
a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
b) HDFS is suitable for storing data related to applications requiring low latency data access
c) HDFS is suitable for storing data related to applications requiring low latency data access
d) None of the mentioned
Answer: a
Explanation: HDFS can be used for storing archive data since it is cheaper as HDFS allows storing the data on low cost commodity hardware while ensuring a high degree of fault-tolerance.
________ is the slave/worker node and holds the user data in the form of Data Blocks.
a) DataNode
b) NameNode
c) Data block
d) Replication
Answer: a
Explanation: A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNode, with data replicated across them.
HDFS provides a command line interface called __________ used to interact with HDFS.
a) “HDFS Shell”
b) “FS Shell”
c) “DFS Shell”
d) None of the mentioned
Answer: b
Explanation: The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS).
HDFS is implemented in _____________ programming language.
a) C++
b) Java
c) Scala
d) None of the mentioned
Answer: b
Explanation: HDFS is implemented in Java and any computer which can run Java can host a NameNode/DataNode on it.
For YARN, the ___________ Manager UI provides host and port information.
a) Data Node
b) NameNode
c) Resource
d) Replication
Answer: c
Explanation: All the metadata related to HDFS including the information about data nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.
Point out the correct statement.
a) The Hadoop framework publishes the job flow status to an internally running web server on the master nodes of the Hadoop cluster
b) Each incoming file is broken into 32 MB by default
c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance
d) None of the mentioned
Answer: a
Explanation: The web interface for the Hadoop Distributed File System (HDFS) shows information about the NameNode itself.
For ________ the HBase Master UI provides information about the HBase Master uptime.
a) HBase
b) Oozie
c) Kafka
d) All of the mentioned
Answer: a
Explanation: HBase Master UI provides information about the number of live, dead and transitional servers, logs, ZooKeeper information, debug dumps, and thread stacks.
During start up, the ___________ loads the file system state from the fsimage and the edits log file.
a) DataNode
b) NameNode
c) ActionNode
d) None of the mentioned
Answer: b
Explanation: HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it
A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker.
a) MapReduce
b) Mapper
c) TaskTracker
d) JobTracker
Answer: c
Explanation: TaskTracker receives the information necessary for the execution of a Task from JobTracker, Executes the Task, and Sends the Results back to JobTracker.
___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results.
a) Maptask
b) Mapper
c) Task execution
d) All of the mentioned
Answer: a
Explanation: Map Task in MapReduce is performed using the Map() function.
_________ function is responsible for consolidating the results produced by each of the Map() functions/tasks.
a) Reduce
b) Map
c) Reducer
d) All of the mentioned
Answer: a
Explanation: Reduce function collates the work and resolves the results.
Point out the wrong statement.
a) A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner
b) The MapReduce framework operates exclusively on <key, value> pairs
c) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods
d) None of the mentioned
Answer: d
Explanation: The MapReduce framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.
Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in ____________
a) Java
b) C
c) C#
d) None of the mentioned
Answer: a
Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM based).
________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer.
a) Hadoop Strdata
b) Hadoop Streaming
c) Hadoop Stream
d) None of the mentioned
Answer: b
Explanation: Hadoop streaming is one of the most important utilities in the Apache Hadoop distribution.
__________ maps input key/value pairs to a set of intermediate key/value pairs.
a) Mapper
b) Reducer
c) Both Mapper and Reducer
d) None of the mentioned
Answer: a
Explanation: Maps are the individual tasks that transform input records into intermediate records.
The number of maps is usually driven by the total size of ____________
a) inputs
b) outputs
c) tasks
d) None of the mentioned
Answer: a
Explanation: Total size of inputs means the total number of blocks of the input files.
_________ is the default Partitioner for partitioning key space.
a) HashPar
b) Partitioner
c) HashPartitioner
d) None of the mentioned
Answer: c
Explanation: The default partitioner in Hadoop is the HashPartitioner which has a method called getPartition to partition.
Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster.
a) MapReduce
b) Map
c) Reducer
d) All of the mentioned
Answer: a
Explanation: In some applications, component tasks need to create and/or write to side-files, which differ from the actual job-output files.
1.Following represent column in NoSQL __________. (A) Database (B) Field (C) Document (D) Collection Answer -B 2.What is the aim of NoSQL? (A) NoSQL provides an alternative to SQL databases to store textual data. (B) NoSQL databases allow storing non-structured data. (C) NoSQL is not suitable for storing structured data. (D) NoSQL is a new data format to store large datasets. Answer- D 3.__________ is a online NoSQL developed by Cloudera. (A) HCatalog (B) Hbase (C) Imphala (D) Oozie Answer-B 4.Which of the following is not a NoSQL database? (A) SQL Server (B) MongoDB (C) Cassandra (D) None of the mentioned Answer-A 5.Which of the following is a NoSQL Database Type? (A) SQL (B) Document databases (C) JSON (D) All of the mentioned Answer-B 6.Following represent column in NoSQL __________. (A) Database (B) Field (C) Document (D) Collection Answer-B 7.What is the aim of NoSQL? (A) NoSQL provides an alternative to SQL databases to store textual data. (B) NoSQL databases allow storing non-structured data. (C) NoSQL is not suitable for storing structured data. (D) NoSQL is a new data format to store large datasets. Answer-D 8.__________ is a online NoSQL developed by Cloudera. (A) HCatalog (B) Hbase (C) Imphala (D) Oozie Answer-B 9.Which of the following is not a NoSQL database? (A) SQL Server (B) MongoDB (C) Cassandra (D) None of the mentioned Answer-A 10.Which of the following is a NoSQL Database Type? (A) SQL (B) Document databases (C) JSON (D) All of the mentioned Answer-B11.Following represent column in NoSQL __________. (A) Database (B) Field (C) Document (D) Collection Answer-B 12.What is the aim of NoSQL? (A) NoSQL provides an alternative to SQL databases to store textual data. (B) NoSQL databases allow storing non-structured data. (C) NoSQL is not suitable for storing structured data. (D) NoSQL is a new data format to store large datasets. Answer-D 13.__________ is a online NoSQL developed by Cloudera. (A) HCatalog (B) Hbase (C) Imphala (D) Oozie Answer-B 14.Which of the following is not a NoSQL database? (A) SQL Server (B) MongoDB (C) Cassandra (D) None of the mentioned Answer-A 15.Which of the following is a NoSQL Database Type? (A) SQL (B) Document databases (C) JSON (D) All of the mentioned Answer-B 16.Following represent column in NoSQL __________. (A) Database (B) Field (C) Document (D) Collection Answer-B 17.What is the aim of NoSQL? (A) NoSQL provides an alternative to SQL databases to store textual data. (B) NoSQL databases allow storing non-structured data. (C) NoSQL is not suitable for storing structured data. (D) NoSQL is a new data format to store large datasets. Answer-D 18.__________ is a online NoSQL developed by Cloudera. (A) HCatalog (B) Hbase (C) Imphala (D) Oozie Answer-B 19.Which of the following is not a NoSQL database? (A) SQL Server (B) MongoDB (C) Cassandra (D) None of the mentioned Answer-A 20.Which of the following is a NoSQL Database Type? (A) SQL (B) Document databases (C) JSON (D) All of the mentioned Answer-BLearn Big Data Analytics from Scratch21.Following represent column in NoSQL __________. (A) Database (B) Field (C) Document (D) Collection Answer-B 22.What is the aim of NoSQL? (A) NoSQL provides an alternative to SQL databases to store textual data. (B) NoSQL databases allow storing non-structured data. (C) NoSQL is not suitable for storing structured data. (D) NoSQL is a new data format to store large datasets. Answer-D 23.__________ is a online NoSQL developed by Cloudera. (A) HCatalog (B) Hbase (C) Imphala (D) Oozie Answer-B 24.Which of the following is not a NoSQL database? (A) SQL Server (B) MongoDB (C) Cassandra (D) None of the mentioned Answer-A 25.Which of the following is a NoSQL Database Type? (A) SQL (B) Document databases (C) JSON (D) All of the mentioned Answer-BLearn Big Data Analytics from Scratch
1.Bloom filter was proposed by : Burton morris Bloom b.Burton Howard Bloom c.Burton Datar Bloom d.Burton Howrd Bloom Answer : Burton morris Bloom 2. A simple space-efficient randomized data structure for representing a set in order to support membership queries a.Bloom Filter b.Flajolet Martin c.DGIM K-means Answer: a. Bloom Filter 3.It is a web-based financial search engine that evaluates queries over real-time streaming financial data such as stock tickers and news feeds a.Traderbot b.Tradebot c.Clickbot d.Hyperbot Answer : a.Traderbot 3.If the stream contains n elements with m of them unique, the FM algorithm needs a memory of . a.O((m)) b.O(log(m+1)) c.O(log(m+2)) d.O(log(m)) Answer : d.O(log(m)) 4.Calculate h(3) , given S=1,3,2,1,2,3,4,3,1,2,3,1 and h(x)=(6x+1) mod 5 a.19 b.10 c.15 D.16 Answer : a.19 5.According to Bloom filter principle, we should consider the potential effects of: a.true positives b.false negatives c.false positives d.true negatives Answer : c.false positives 6.Who released a hash function named MurmurHash in 2008: a.Datar Motwani b.Austin Appleby c.Marrianne Durrand d.Burton Datar Bloom Answer : b.Austin Appleby 8.The files on disks or records in databases need to be stored in Bloom filter as a.keys b.values c.key-values d.columns Answer : c.key-values 9. If the stream contains n elements with m of them unique, the FM algorithm runs in ------------------ time a.O(sq.rt(n)) b.O(n+2) c.O(n+1) d.O(n) Answer : d.O(n) 10.Given h(x) = x + 6 mod 32 , The binary value of h(4) : a.1011 b.1010 c.1110 d.1111 Answer : b.101011.Flajolet-Martin algorithm approximates the number of unique objects in a stream or a database in how many passes? a.n b.0 c.1 d.2 Answer : c.1 12.What is important when the input rate of facebook data is controlled externally: a.facebook Management b.Query Management c.Stream Management d.data Management Answer: b.Query Management 13.Which algorithmn solution does not assume uniformity? a.DGIM b.FM c.SON d.K-MEANS Answer: a.DGIM 14. Which query operator is unable to produce an answer until it has seen its entire input: a.Blocking query operator b.Discrete operator c.continuous operator d.Continuous operator and discrete queries 15. 000101 has tail length of ------- : 1 2 3 0 Answer: d. 0 16.In FM algorithm, The probability that a given h(a) ends in at least i 0’s is 1 0 2^-i i Answer : c. 2^-i 17. Probability of a false positive in Bloom Filters depends on a.the number of hash functions b.the density of 1’s in the array c.the number of hash functions and the density of 1’s in the array d.the density of 0’s in the array Answer : c.the number of hash functions and the density of 1’s in the array 18. It is an array of bits, together with a number of hash functions a.Bloom filter Hash Function c.Data Stream d.Binary input Answer : a.Bloom filter 19. ______________query is one that is supplied to the Dsms before any relevant data arrived a.Continuous queries and discrete queries b.discrete queries c.ad-hoc d.pre-defined Answer : d.pre-definedLearn Big Data Analytics from Scratch20. Sorting used for query processing is an example of : a.Blocking query operator b.Blocking discrete operator c.Blocking Continuous operator d.Continuous operator Answer : a.Blocking query operatorLearn Big Data Analytics from Scratch
1.PCY Stands for
a.Park-Chen-Yu
b.Park-Chen-You
c.Park-Check-Yu
d.Park-Check-You
Answer :a.Park-Chen-Yu
2. SON Algorithm Stands for
a.Shane,Omiecinski and Navathe
b.Savasere,Omiecinski and Navathe
c.Savare,Omienal and Navathe
d.Savasere,Omiecinski and Navarag
Answer : b.Savasere,Omiecinski and Navathe
3. Minimum Support=?,if total Transaction =5 and minimum Support=60%
a.30
b.3
c.300
d.65
Answer : b.3
4.Minimum Support=?,if total Transaction =10 and minimum Support=60%
a.6
b.0.6
c.10
d.5
Answer : a.6
5. How do you calculate Confidence(B -> A)?
a.Support(A B) / Support (A)
b.Support(A B) / Support (B)
c.Support(A ) / Support (B)
d.Support( B) / Support (A)
Answer : b.Support(A B) / Support (B)
1.Which of the following is true?
a.graph may contain no edges and many vertices
b.graph may contain many edges and atleast one vertices
c.graph may contain no edges and no vertices
d.graph may contain no vertices and many edges
Answer : b. graph may contain many edges and atleast one vertices
2.Social Network is defined as
a.Collection of entities that participate in the network.
b.Collection of items in store
c.Collection of vertices & edges in a graph
d.Collection of nodes in a graph
Answer : a.Collection of entities that participate in the network.
3.Which of the following is finally produced by Hierarchical Clustering?
a.final estimate of cluster centroids
b.tree showing how close things are to each other
c.Assignment of each point to clusters
d.Assignment of each edges of clusters
Answer : b.tree showing how close things are to each other
4. Which of the following clustering requires merging approach?
a.Partitional
b.Hierarchical
c.Naive Bayes
d.K-means
Answer : b.Hierarchical
5.Which of the following function is used for k-means clustering?
a.K-means
b.Euclidean Distance
c.Heatmap
d.Correlation Similarity
Answer: a.K-means
6.___________ was the pioneer in the field of web search with the use of PageRank for ranking Web pages with respect to a user query.
a.Yahoo
b.YouTube
c.Facebook
d.Google
Answer : d. Google
7.Which of the following algorithm is used by Google to determine the importance of a particular page?
a.SVD
b.PageRank
c.FastMap
d.All of the above
Answer : b.PageRank
8.One of the popular techniques of Spamdexing is ___________
a.Clocking
b.Cooking
c.Cloaking
d.Crocking
Answer: c.Cloaking
9.Doorway pages are_________ Web pages.
a.High quality
b.Low quality
c.Informative
d.High content
Answer : b.Low quality
10.PageRank helps in measuring ________________ of a Web page within a set of similar entries.
a.Relative importance
b.Size
c.Cost
d.All of the above
Answer : a.Relative importance
11.PageRank helps in measuring ________________ of a Web page within a set of similar entries.
a.Relative importance
b.Size
c.Cost
d.All of the above
Answer : a.Relative importance
12.Web pages with Dead ends means__________
a.Pages with no outlinks
b.Pages with no PageRank
c.Pages with no contents
d.Pages with spam
Answer : a.Pages with no outlinks
13.Topic Sensitive PageRank (TSPR) is proposed by_________ in 2003.
a.Al-Saffar
b.Bratislav V. Stojanović
c.Jianshu WENG
d.Taher H. Haveliwala
Answer : d. Taher H. Haveliwala
14.Full form of HITS is _____________
a.High Influential Topic Search
b.High Informative Topic Search
c.Hyperlink-induced topic Search
d.None of the above
Answer: c.Hyperlink-induced topic Search
16.When the objective is to mine social network for patterns, a natural way to represent a social network is by a___________
a.Tree
b.Graph
c.Arrays
d.Lists
Answer : b.Graph
17.A social network can be considered as a___________
a.Heterogeneous and multi relational dataset
b.LiveJournal
c.Twitter
d.DBLP
Answer : a.Heterogeneous and multi relational dataset
18.For an edge ‘e’ in a graph, ___________of ‘e’ is defined as the number of shortest paths between all node pairs (vi vj) in the graph such that the shortest path passes through ‘e’.
a.Edge path
b.Edge measure
c.Edge closeness
d.Edge betweenness
Answer : d.Edge betweenness
19.“You may also like these…”, “People who liked this also liked….”, this type of suggestions are from the______________
a.Filtering System
b.Collaborative System
c.Recommendation System
d.Amazon System
Answer: c.Recommendation System
20.An approach to a Recommendation system is to treat this as the _______________ problem using items profiles and utility matrices.
a.MapReduce
b.Social Network
c.Machine learning
d.Unstructured
Answer : c.Machine learning
Prepare For Your Placements: https://lastmomenttuitions.com/courses/placement-preparation/
/ Youtube Channel: https://www.youtube.com/channel/UCGFNZxMqKLsqWERX_N2f08Q
Follow For Latest Updates, Study Tips & More Content!
Not a member yet? Register now
Are you a member? Login now