Get Latest Exam Updates, Free Study materials and Tips
Data warehousing is the process of constructing and using a data warehouse.A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves data cleaning, data integration, and data consolidations.
Data Warehouse | Data Marts |
---|---|
1.A data warehouse is a large centralized repository of data that contains information from many sources within an organization. The collated data is used to guide business decisions through analysis, reporting, and data mining tools. | 1.A data mart is a subset of a data warehouse oriented to a specific business line. Data marts contain repositories of summarized data collected for analysis on a specific section or unit within an organization |
2.In size,a data warehouse is typically larger than 100 GB and often a terabyte or more. | 2.In size,a data mart is typically less than 100 GB. |
3.A data warehouse is typically enterprisewide and ranges across multiple areas. | 3.A data mart is limited to a single focus for one line of business. |
4.Data Warehouse designing process is complicated. | 4.The Data Mart process is easy to design. |
Tier-1: The bottom tier is a warehouse database server that is almost always a relational database system. Back-end tools and utilities are used to feed data into the bottom tier from operational databases or other external sources.These tools and utilities perform data extraction, cleaning, and transformation (e.g., to merge similar data from different sources into a unified format), as well as load and refresh functions to update the data warehouse . The data are extracted using application program interfaces known as gateways. A gateway is supported by the underlying DBMS and allows client programs to generate SQL code to be executed at a server.
Examples Of gateways include ODBC (Open Database Connection) and OLEDB (Open Linking and Embedding for Databases) by Microsoft and JDBC (Java Database Connection).
Tier-2: The middle tier is an OLAP server that is typically implemented using either a relational OLAP (ROLAP) model or a multidimensional OLAP.
OLAP model is an extended relational DBMS that maps operations on multidimensional data to standard relational operations.
A multidimensional OLAP (MOLAP) model, that is, a special-purpose server that directly implements multidimensional data and operations.
Tier-3: The top tier is a front-end client layer, which contains query and reporting tools, analysis tools, and/or data mining tools (e.g., trend analysis, prediction, and so on).
E-R modeling | Dimensional modeling |
---|---|
1.Suitable for Online Line Transaction Processing (OLTP) Application | 1.Suggested for Data Warehousing Applications |
2.It consists of entities and relationships | 2.It consists of facts and dimensions |
3.Data Redundancy is not desired | 3.Data Redundancy is desired |
In a data warehouse, a schema is used to define the way to organize the system with all the database entities (fact tables, dimension tables) and their logical association.
Here are the different types of Schemas in Data warehouse: i)Star Schema ,ii) SnowFlake Schema, iii)Fact Constellation Schema or Galaxy Schema.
Online Analytical Processing is a software technology that empowers analysts, managers and executives with fast, consistent, interactive access to a variety of possible views of information.
Here is the list of OLAP operations −
Roll-up
Drill-down
Slice and dice
Pivot (rotate)
Roll-up: It is performed by either reducing the dimension or stepping up operation.Roll-up performs aggregation on a data cube in any of the following ways −
By climbing up a concept hierarchy for a dimension
By dimension reduction.
Drill-down:It is performed by either increasing the dimension or stepping down operation. Drill-down is the reverse operation of roll-up. It is performed by either of the following ways −
By stepping down a concept hierarchy for a dimension
By introducing a new dimension.
Slice and dice:Slicing is the process of extracting the sub cube of related information and dicing is the analysis of that sub cube.
Pivot:The pivot operation is also known as rotation. It rotates the data axes in view in order to provide an alternative presentation of data.
Not a member yet? Register now
Are you a member? Login now