Data Warehouse Modeling



index
Disabled back button Next Section
printable version

Section 0: Module Objectives or Competencies
Course Objective or Competency Module Objectives or Competency
The student will be able to list and explain basic data warehouse concepts. The student will be able to explain the basic concepts of dimensional modeling, including dimension tables and fact tables.
The student will be able to explain the most commonly used data warehouse modeling strategies with respect to how they utilize ER modeling, relational modeling, and dimensional modeling.
Section 1: Data Warehouse Modeling Approaches

Both ER modeling and relational modeling can be used in the development of data warehouses.

The following are three of the most common data warehouse and data mart modeling approaches:

Section 2: Normalized Data Warehouse

Envisions a data warehouse as an integrated analytical database modeled by using traditional database modeling techniques like ER modeling and relational modeling, resulting in a normalized relational database schema.

The following figure illustrates a normalized data warehouse.

The idea behind this method is to have a central data warehouse modeled as an ER model that is subsequently mapped into a normalized relational database model.

Section 3: Dimensionally Modeled Data Warehouse

Views a data warehouse as a dimensional model that integrates analytically useful information from the operational data sources.

The following figure illustrates a normalized data warehouse.

A dimensionally modeled data warehouse can be used as a source for dependent data marts and other views, subsets, and/or extracts.

Section 4: Independent Data Marts

In this method, stand–alone data marts are created by various groups within the organization, independent of other stand–alone data marts in the organization, and as a consequence, multiple ETL systems are created and maintained.

Section 5: Dimensional Modeling

Dimensional Modeling: Introduction (begin at the 4:10 mark)




Dimensional modeling, a modeling technique tailored specifically for subject-oriented analytical data warehouse and data mart design purposes, is the preferred technique for presenting analytic data.




Before exploring dimensional modeling, we need to understand facts and dimensions. These are critical data warehousing concepts.

Facts and Dimensions

Facts

A fact is a quantitative piece of information.

Dimensions

A dimension is a set of data attributes pertaining to something of interest to a business.

Fact Tables and Dimension Tables

A fact table holds the data to be analyzed, and a dimension table stores data about the ways in which the data in the fact table can be analyzed. Hence, fact tables have more records than dimension tables. fact table

In addition to using regular relational concepts (primary keys, foreign keys, integrity constraints, etc.) dimensional modeling distinguishes two types of tables:




Star schema




Dimensional Model (Star Schemas and OLAP Cubes)

You can conceive of a dimensional database as a database cube of three or four dimensions where users can access a slice of the database along any of its dimensions.

Both stars and cubes have a common logical design with recognizable dimensions; however, the physical implementation differs (see figure below).

A star schema can be physically implemented in the form of a cube.



OLAP Cubes

Section 6: Initial Example: Dimensional Model Based on A Single Source

The following figures show the ER diagram and the resulting relational schema for a sales department database.



The following figure show data records for the sales department database.

Based on the operational database shown in the above figures, the dimensional modeling technique can be used to design an analytical database whose subject of analysis is sales.

In the star schema, the chosen subject of analysis is represented by a fact table.

Designing the star schema involves considering which dimensions to use with the fact table that represents the chosen subject.


In the example above, a dimensional model was developed to analyze sales per product, customer, store, and date.


The dimensionally modeled database shown above makes it possible to easily create analytical queries, such as:

Such queries in dimensionally modeled databases can be issued by using SQL or by using Online Analytical Processing/Business Intelligence (OLAP/BI) tools.

The following query uses the dimensional model to address Query A. This dimensional version will be referred to as Query A1.


Using using the original nondimensional database makes queries more complex. Here is the nondimensional version (Query A2), which not only had to join more tables, but also requires user-defined functions to extract the year, quarter, and day of the week from the date.

Compare the two queries to observe the simplicity advantage for the analyst issuing a query on the dimensional model.


The convenience of analysis with the dimensionally modeled data is even more evident in cases when data necessary for analysis resides in multiple sources, as discussed in Section 8.

Section 7: Characteristics of Dimensions and Facts

Surrogate key

Often, dimension tables in a star schema will include a simple, non-composite system-generated key, also called a surrogate key.

Types of Fact Tables

Types of Keys Relevant to Dimensional Modeling

Section 8: Expanded Example: Dimensional Model Based on Multiple Sources

More commonly, data warehouses draw data from multiple sources.

The preceding dimensional modeling example depicts a scenario where an analytical database is based on a single source.

The following expansion of the initial example adds two additional data sources:

The following figure shows an expanded dimensional model that is now based on three different data sources:



The expanded model makes it possible to ask even more refined analytical questions that consider a larger variety of analysis factors. For example, consider the following question.

To obtain the answer to the Query B from the operational sources, a user would have to issue several queries spanning all three operational sources and then combine the results of those queries to receive the final answer.

Obtaining the answer to Query B using the dimensionally modeled database shown above is much more straightforward.

A query using dimensionally modeled data such as Query B above, could be created in an even simpler fashion by utilizing OLAP/BI tools.

Section 9: Additional Possible Fact Attributes

As mentioned earlier, a fact table contains

In addition to the measures related to the subject of analysis, in certain cases fact tables can contain other attributes that are not measures, as illustrated by the following figure.




Two of the most typical additional attributes that can appear in the fact table are

These two additional attributes will be discussed next.


Transaction Identifier in the Fact Table

First, the transaction identifier concept is demonstrated.

In the SALES fact table for the dimensional model shown above, the primary key is comprised of the TID and ProductKey columns, because each record in the fact table refers to a particular line item of a particular sale transaction.




Transaction Time in the Fact Table

In order to examine the transaction time concept, let us assume that the business analysts analyzing subject sales would like to incorporate the time of day into their analysis.

Section 10: Multiple Fact Tables in a Dimensional Model

A dimensional model can contain more than one fact table, a situation that occurs when multiple subjects of analysis share dimensions.

A dimensional model with multiple fact tables is referred to as a constellation (galaxy) of stars.


In the following scenario, in addition to the tracking of sales by the sales department, the quality control department tracks occurrences of defective products in its stores.

The goal is to analyze the defects in the same manner as sales, so a dimensional model will be created in order to analyze the occurrences of defective products found in stores.

Section 11: Detailed Versus Aggregated Fact Tables

Fact tables in a dimensional model can contain either detailed data or aggregated data.

The following example illustrates the difference between the two types of data in fact tables.


Detailed Fact Table


Aggregated Fact Table

The example above is one of the many ways in which the data from the source in the above figure showing data records for the Sales Department Database could have been aggregated.





Another example of a dimensional model with an aggregated sales fact table (SalesPerDCS – per day, customer, and store) is shown in the following figure.




Detailed Versus Aggregated Fact Table

To recap the concept of detailed and aggregated fact tables, we take another look at other fact tables in dimensional models in this chapter and distinguish which ones are detailed and which ones are aggregated.

Section 12: Granularity of the Fact Table

The granularity of the table describes what is depicted by one row in the table.

Here are some examples:

Granularity of the fact tables

Line-Item Versus Transaction-Level Detailed Fact Table

Detailed fact tables can represent different types of information, depending on what is depicted by the underlying data sources.

The two most common types of detailed fact tables are line-item fact tables and transaction-level fact tables.

Section 13: Slowly Changing Dimensions and Timestamps

A typical dimension in a star schema contains either

A dimension that contains attributes whose values can change is often referred to as a slowly changing dimension.

Slow Changing Dimensions

There are several different approaches to dealing with slowly changing dimensions, with the most common referred to as Type 1, Type 2, and Type 3.

Section 14: Comparing ER Modeling and Dimensional Modeling as Data Warehouse Design Techniques

ER modeling

ER modeling is a technique for facilitating the requirements collection process, where all requirements are collected and visualized as the ER model constructs: entities, attributes, and relationships.

Dimensional modeling

Dimensional modeling can be used both for visualizing requirements for a data warehouse (i.e., conceptual modeling), and for creating a data warehouse model that will be implemented (i.e., logical modeling).

Making a Choice

Both ER modeling and dimensional modeling are viable alternatives for modeling data warehouses/data marts, and can be used within the same project.

Using one technique for visualizing requirements does not preclude using the other technique for creating the model that will actually be implemented.

Section 15: Resources

Granularity is the relative size, scale, or level of detail that characterizes an object or activity.

When granularity is applied to information technology, it refers to the level of detail (fineness) in a model, requirements, data, code modularity, or even a decision making process.

Greater (or finer) granularity corresponds to an increase in the level of detail (fineness of data).

Business intelligence (BI) is a process for gathering usable knowledge about the external business environment and turning it into the intelligence required for tactical or strategic decisions.

Order of magnitude is the quantity of powers of 10 that there are in a number, or the number of powers of 0.1 in a negative number. Order of magnitude is usually written as 10 to the nth power. The n represents the order of magnitude. If you raise a number by one order of magnitude, you are basically multiplying that number by 10. If you decrease a number by one order of magnitude, you are basically multiplying that number by 0.1.