Data Warehouse Modeling



index
Disabled back button Next Section
printable version

Section 0: Module Objectives or Competencies
Course Objective or Competency Module Objectives or Competency
The student will be able to list and explain basic data warehouse concepts. The student will be able to explain the basic concepts of dimensional modeling, including dimension tables and fact tables.
The student will be able to explain the most commonly used data warehouse modeling strategies with respect to how they utilize ER modeling, relational modeling, and dimensional modeling.
Section 1: Overview

Both ER modeling and relational modeling can be used in the development of data warehouses.

In addition to these two methods, a modeling technique known as dimensional modeling is used.

Dimensional Modeling: Introduction




In addition to using regular relational concepts (primary keys, foreign keys, integrity constraints, etc.) dimensional modeling distinguishes two types of tables:

Facts and Dimensions




Star schema

Section 2: Initial Example: Dimensional Model Based on A Single Source

The following figures show the ER diagram and the resulting relational schema for a sales department database.



The following figure show data records for the sales department database.

Based on the operational database shown in the above figures, the dimensional modeling technique can be used to design an analytical database whose subject of analysis is sales.

In the star schema, the chosen subject of analysis is represented by a fact table.

Designing the star schema involves considering which dimensions to use with the fact table that represents the chosen subject.


In the example above, a dimensional model was developed to analyze sales per product, customer, store, and date.


The dimensionally modeled database shown above makes it possible to easily create analytical queries, such as:

Such queries in dimensionally modeled databases can be issued by using SQL or by using Online Analytical Processing/Business Intelligence (OLAP/BI) tools.

First consider the following SQL versions of Query A using the dimensional database from the previous section.


Now consider the following SQL versions of the above query using the original nondimensional database. Functions are required to extract the year, quarter, and day of the week from the date.

Compare the two queries to observe the simplicity advantage for the analyst issuing a query on the dimensional model.


The convenience of analysis with the dimensionally modeled data is even more evident in cases when data necessary for analysis resides in multiple sources, as discussed in Section 4.

Section 3: Characteristics of Dimensions and Facts

Surrogate key

Typically, in a star schema all dimension tables are given a simple, non-composite system-generated key, also called a surrogate key.

Types of Fact Tables

Types of Keys Relevant to Dimensional Modeling

Section 4: Expanded Example: Dimensional Model Based on Multiple Sources

More commonly, data warehouses draw data from multiple sources.

The preceding dimensional modeling example depicts a scenario where an analytical database is based on a single source.

The following expansion of the initial example adds two additional data sources:

The following figure shows an expanded dimensional model that is now based on three different data sources:



The expanded model makes it possible to ask even more refined analytical questions that consider a larger variety of analysis factors. For example, consider the following question.

To obtain the answer to the Query B from the operational sources, a user would have to issue several queries spanning all three operational sources and then combine the results of those queries to receive the final answer.

Obtaining the answer to Query B using the dimensionally modeled database shown above is much more straightforward.

A query using dimensionally modeled data such as Query B above, could be created in an even simpler fashion by utilizing OLAP/BI tools.

Section 5: Additional Possible Fact Attributes

As mentioned earlier, a fact table contains

In addition to the measures related to the subject of analysis, in certain cases fact tables can contain other attributes that are not measures, as illustrated by the following figure.




Two of the most typical additional attributes that can appear in the fact table are

These two additional attributes will be discussed next.


Transaction Identifier in the Fact Table

First, the transaction identifier concept is demonstrated.

In the SALES fact table for the dimensional model shown above, the primary key is comprised of the TID and ProductKey columns, because each record in the fact table refers to a particular line item of a particular sale transaction.




Transaction Time in the Fact Table

In order to examine the transaction time concept, let us assume that the business analysts analyzing subject sales would like to incorporate the time of day into their analysis.

Section 6: Multiple Fact Tables in a Dimensional Model

A dimensional model can contain more than one fact table, a situation that occurs when multiple subjects of analysis share dimensions.

A dimensional model with multiple fact tables is referred to as a constellation (galaxy) of stars.


In the following scenario, in addition to the tracking of sales by the sales department, the quality control department tracks occurrences of defective products in its stores.

The goal is to analyze the defects in the same manner as sales, so a dimensional model will be created in order to analyze the occurrences of defective products found in stores.

Section 7: Detailed Versus Aggregated Fact Tables

Fact tables in a dimensional model can contain either detailed data or aggregated data.

The following example illustrates the difference between the two types of data in fact tables.


Detailed Fact Table


Aggregated Fact Table

The example above is one of the many ways in which the data from the source in the above figure showing data records for the Sales Department Database could have been aggregated.





Another example of a dimensional model with an aggregated sales fact table (SalesPerDCS – per day, customer, and store) is shown in the following figure.




Detailed Versus Aggregated Fact Table

To recap the concept of detailed and aggregated fact tables, we take another look at other fact tables in dimensional models in this chapter and distinguish which ones are detailed and which ones are aggregated.

Section 8: Granularity of the Fact Table

The granularity of the table describes what is depicted by one row in the table.

Here are some examples:

Granularity of the fact tables

Line-Item Versus Transaction-Level Detailed Fact Table

Detailed fact tables can represent different types of information, depending on what is depicted by the underlying data sources.

The two most common types of detailed fact tables are line-item fact tables and transaction-level fact tables.

Section 9: Slowly Changing Dimensions and Timestamps

A typical dimension in a star schema contains either

A dimension that contains attributes whose values can change is often referred to as a slowly changing dimension.

Slow Changing Dimensions

There are several different approaches to dealing with slowly changing dimensions, with the most common referred to as Type 1, Type 2, and Type 3.

Section 10: Data Warehouse Modeling Approaches

The following are three of the most common data warehouse and data mart modeling approaches:

Section 11: Normalized Data Warehouse

Envisions a data warehouse as an integrated analytical database modeled by using the traditional database modeling techniques of ER modeling and relational modeling, resulting in a normalized relational database schema.

The following figure illustrates a normalized data warehouse.

The idea behind this method is to have a central data warehouse modeled as an ER model that is subsequently mapped into a normalized relational database model.

Section 12: Dimensionally Modeled Data Warehouse

Views a data warehouse as a dimensional model that integrates analytically useful information from the operational data sources.

The following figure illustrates a normalized data warehouse.

A dimensionally modeled data warehouse can be used as a source for dependent data marts and other views, subsets, and/or extracts.

Section 13: Independent Data Marts

In this method, stand—alone data marts are created by various groups within the organization, independent of other stand-alone data marts in the organization, and as a consequence, multiple ETL systems are created and maintained.

Section 14: Comparing ER Modeling and Dimensional Modeling as Data Warehouse Design Techniques

ER modeling

ER modeling is a technique for facilitating the requirements collection process, where all requirements are collected and visualized as the ER model constructs: entities, attributes, and relationships.

Dimensional modeling

Dimensional modeling can be used both for visualizing requirements for a data warehouse (i.e., conceptual modeling), and for creating a data warehouse model that will be implemented (i.e., logical modeling).

Making a Choice

Both ER modeling and dimensional modeling are viable alternatives for modeling data warehouses/data marts, and can be used within the same project.

Using one technique for visualizing requirements does not preclude using the other technique for creating the model that will actually be implemented.

Section 15: References
X
X

Granularity

Granularity is the relative size, scale, or level of detail that characterizes an object or activity.

When granularity is applied to information technology, it refers to the level of detail (fineness) in a model, requirements, data, code modularity, or even a decision making process.

Greater (or finer) granularity corresponds to an incease in the level of detail (fineness of data).

X
X
X
X
X