Data Warehousing and Mining Notes
Data Warehousing and Mining Notes
Data Warehousing and Mining Notes is semester 6 subject of final year of computer engineering in Mumbai University. Prerequisite for studying this subject are Basic database concepts, Concepts of algorithm design and analysis.
Module Introduction to Data Warehouse and Dimensional modelling contains the following topics Introduction to Strategic Information, Need for Strategic Information, Features of Data Warehouse, Data warehouses versus Data Marts, Top-down versus Bottom-up approach. Data warehouse architecture, metadata, E-R modelling versus Dimensional Modelling, Information Package Diagram, STAR schema, STAR schema keys, Snowflake Schema, Fact Constellation Schema, Factless Fact tables, Update to the dimension tables, Aggregate fact tables. Module ETL Process and OLAP contains the following topics Major steps in ETL process, Data extraction: Techniques, Data transformation: Basic tasks, Major transformation types, Data Loading: Applying Data, OLTP Vs OLAP, OLAP definition, Dimensional Analysis, Hypercubes, OLAP operations: Drill down, Roll up, Slice, Dice and Rotation, OLAP models : MOLAP, ROLAP. Module Introduction to Data Mining, Data Exploration and Preprocessing contains the following topics Data Mining Task Primitives, Architecture, Techniques, KDD process, Issues in Data Mining, Applications of Data Mining, Data Exploration Types of Attributes, Statistical Description of Data, Data Visualization, Data Preprocessing: Cleaning, Integration, Reduction: Attribute subset selection, Histograms, Clustering and Sampling, Data Transformation & Data Discretization: Normalization, Binning, Concept hierarchy generation, Concept Description Attribute oriented Induction for Data Characterization. Module Classification, Prediction and Clustering: Basic Concepts, Decision Tree using Information Gain, Induction: Attribute Selection Measures, Tree pruning, Bayesian Classification: Naive Bayes, Classifier Rule Based Classification: Using IFTHEN Rules for classification, Prediction: Simple linear regression, Multiple linear regression Model Evaluation & Selection: Accuracy and Error measures, Holdout, Random Sampling, Cross Validation, Bootstrap, Clustering: Distance Measures, Partitioning Methods (k-Means, k-Medoids), Hierarchical Methods(Agglomerative, Divisive). Module Mining Frequent Patterns and Association Rules contains the following topics Market Basket Analysis, Frequent Item sets, Closed Item sets, and Association Rule, Frequent Pattern Mining, Efficient and Scalable Frequent Item set Mining Methods: Apriori Algorithm, Association Rule Generation, Improving the Efficiency of Apriori, FP growth, Mining frequent Itemsets using Vertical Data Format, Introduction to Mining Multilevel Association Rules and Multidimensional Association Rules. Module Spatial and Web Mining contains the following topics Spatial Data, Spatial Vs. Classical Data Mining, Spatial Data Structures, Mining Spatial Association and Co-location Patterns, Spatial Clustering Techniques: CLARANS Extension, Web Mining: Web Content Mining, Web Structure Mining, Web Usage mining, Applications of Web Mining.
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. The data stored in the warehouse is uploaded from the operational systems (such as marketing or sales). The data may pass through an operational data store and may require data cleansing for additional operations to ensure data quality before it is used in the DW for reporting.
Data mining is a process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use. Data mining is the analysis step of the “knowledge discovery in databases” process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
Course Objectives of the subject Data Warehousing and Mining to identify the scope and essentiality of Data Warehousing and Mining. To analyze data, choose relevant models and algorithms for respective applications. To study spatial and web data mining. To develop research interest towards advances in data mining.Outcomes of the Course Data Warehousing and Mining to on successful completion of course learner will be able to Understand Data Warehouse fundamentals, Data Mining Principles 2. Design data warehouse with dimensional modelling and apply OLAP operations. Identify appropriate data mining algorithms to solve real world problems. Compare and evaluate different data mining techniques like classification, prediction, clustering and association rule mining. Describe complex data types with respect to spatial and web mining. Benefit the user experiences towards research and innovation. Suggested Texts Books for Data Warehousing and Mining by Mumbai University are as follows PaulrajPonniah, Data Warehousing: Fundamentals for IT Professionals, Wiley India. Han, Kamber, “Data Mining Concepts and Techniques”, Morgan Kaufmann 3rd edition. ReemaTheraja Data warehousing, Oxford University Press. M.H. Dunham, “Data Mining Introductory and Advanced Topics”, Pearson Education. Suggested Reference Books for Data Warehousing and Mining by Mumbai University are as follows Ian H. Witten, Eibe Frank and Mark A. Hall ” Data Mining “, 3rd Edition Morgan kaufmann publisher. Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to Data Mining”, Person Publisher. R. Chattamvelli, “Data Mining Methods” 2nd Edition Narosa Publishing House.
Prepare For Your Placements: https://lastmomenttuitions.com/courses/placement-preparation/
/ Youtube Channel: https://www.youtube.com/channel/UCGFNZxMqKLsqWERX_N2f08Q
Follow For Latest Updates, Study Tips & More Content!
- Lectures 2
- Quizzes 0
- Skill level All levels
- Language English
- Students 51
- Certificate No
- Assessments Yes