Big Data and Hadoop Viva Question
Big Data and Hadoop Viva Question
What you’ll Learn:
- Key issues in big data management and its associated applications for business decisions and strategy.
- Develop problem solving and critical thinking skills in fundamental enabling techniques like Hadoop, Mapreduce and NoSQL in big data analytics.
- Interpret business models and scientific computing paradigms.
- Perspectives of big data analytics in various applications.
Big Data Analytics is semester 7 subject of final year of computer engineering in Mumbai University. Prerequisite for studying this subject are Some prior knowledge about Java programming, Basics of SQL, Data mining and machine learning methods would be beneficial. Course Objectives for the subject Big Data Analytics is to provide an overview of an exciting growing field of big data analytics. To introduce programming skills to build simple solutions using big data technologies such as MapReduce and scripting for NoSQL, and the ability to write parallel algorithms for multiprocessor execution. To teach the fundamental techniques and principles in achieving big data analytics with scalability and streaming capability. To enable students to have skills that will help them to solve complex real-world problems in for decision support. To provide an indication of the current research approaches that is likely to provide a basis for tomorrow’s solutions.
Course Outcomes for the subject Big Data Analytics is that Learner will be able to understand the key issues in big data management and its associated applications for business decisions and strategy. Develop problem solving and critical thinking skills in fundamental enabling techniques like Hadoop, MapReduce and NoSQL in big data analytics. Collect, manage, store, query and analyze various forms of Big Data. Interpret business models and scientific computing paradigms, and apply software tools for big data analytics. Adapt adequate perspectives of big data analytics in various applications like recommender systems, social media applications etc. Solve Complex real world problems in various applications like recommender systems, social media applications, health and medical systems, etc.
Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many fields (columns) offer greater statistical power, while data with higher complexity may lead to a higher false discovery rate. Big data analysis challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. Big data was originally associated with three key concepts volume, variety, and velocity. The analysis of big data presents challenges in sampling, and thus previously allowing for only observations and sampling. Therefore, big data often includes data with sizes that exceed the capacity of traditional software to process within an acceptable time and value.
Module Introduction to Big Data and Hadoop consists of the following subtopic Introduction to Big Data, Big Data characteristics, types of Big Data, Traditional vs. Big Data business approach, Case Study of Big Data Solutions. Concept of Hadoop, Core Hadoop Components; Hadoop Ecosystem.
Module Hadoop HDFS and MapReduce Distributed File Systems consists of the following subtopic Physical Organization of Compute Nodes, Large-Scale File-System Organization. MapReduce: The Map Tasks, Grouping by Key, The Reduce Tasks, Combiners, Details of MapReduce Execution, Coping With Node Failures. Algorithms Using MapReduce: Matrix-Vector Multiplication by MapReduce, Relational-Algebra Operations, Computing Selections by MapReduce, Computing Projections by MapReduce, Union, Intersection, and Difference by MapReduce Hadoop Limitations
Module NoSQL consists of the following subtopic Introduction to NoSQL, NoSQL Business Drivers, NoSQL Data Architecture Patterns: Key-value stores, Graph stores, Column family (Bigtable) stores, Document stores, Variations of NoSQL architectural patterns, NoSQL Case Study NoSQL solution for big data, Understanding the types of big data problems; Analyzing big data with a shared-nothing architecture; Choosing distribution models: master-slave versus peer-to-peer; NoSQL systems to handle big data problems.
Module Mining Data Streams consists of the following subtopic The Stream Data Model: A Data-Stream-Management System, Examples of Stream Sources, Stream Queries, Issues in Stream Processing. Sampling Data techniques in a Stream, Filtering Streams: Bloom Filter with Analysis. Counting Distinct Elements in a Stream, Count-Distinct Problem, Flajolet-Martin Algorithm, Combining Estimates, Space Requirements, Counting Frequent Items in a Stream, Sampling Methods for Streams, Frequent Itemsets in Decaying Windows. Counting Ones in a Window: The Cost of Exact Counts, the Datar-Gionis-Indyk-Motwani Algorithm, Query Answering in the DGIM Algorithm, Decaying Windows.
Module Finding Similar Items and Clustering consists of the following subtopic Distance Measures: Definition of a Distance Measure, Euclidean Distances, Jaccard Distance, Cosine Distance, Edit Distance, Hamming Distance. CURE Algorithm, Stream-Computing, A Stream-Clustering Algorithm, Initializing & Merging Buckets, Answering Queries.
Module Real-Time Big Data Models consists of the following subtopic PageRank Overview, Efficient computation of PageRank: PageRank Iteration Using MapReduce, Use of Combiners to Consolidate the Result Vector.A Model for Recommendation Systems, Content-Based Recommendations, Collaborative Filtering. Social Networks as Graphs, Clustering of Social-Network Graphs, Direct Discovery of Communities in a social graph.
Suggested Reference Books for the subject Big Data Analytics by Mumbai University is as follows Creamed Raja Raman and Jeff Ullman ―Mining of Massive Dataset, Cambridge University Press, Alex Holmes Hadoop in Practice, Manning Press, Dreamtech Press. Dan Mcary and Ann Kelly Making Sense of NoSQL A guide for managers and the rest of us, Manning Press. Suggested Reference Books for the subject Big Data Analytics by Mumbai university is as follows Bill Franks ,Taming The Big Data Tidal Wave: Finding Opportunities In Huge Data Streams With Advanced Analytics, Wiley. Chuck Lam, Hadoop in Action, Dreamtech Press. Jared Dean, Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners, Wiley India Private Limited, 2014. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 3rd ed, 2010. Lior Rokach and Oded Maimon, Data Mining and Knowledge Discovery Handbook, Springer, 2nd edition, 2010. Ronen Feldman and James Sanger,The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2006. Vojislav Kecman, Learning and Soft Computing, MIT Press, 2010.
Join in to learn Big Data Analytics , equally important from the academic as well as real-world knowledge.
- Introduction to Big Data and Hadoop.
- Hadoop HDFS and MapReduce
- Mining Data Streams
- Finding similar and clustering
- Real-Time Big Data models
Feel forward to have a look at course description and demo videos and we look forward to see you learning with us.
Prepare For Your Placements: https://lastmomenttuitions.com/courses/placement-preparation/
/ Youtube Channel: https://www.youtube.com/channel/UCGFNZxMqKLsqWERX_N2f08Q
Follow For Latest Updates, Study Tips & More Content!
- Lectures 6
- Quizzes 0
- Duration 50 hours
- Skill level All levels
- Language English
- Students 0
- Certificate No
- Assessments Yes