Normalization



index
Disabled back button Next Section
printable version

Section 0: Module Objectives or Competencies
Course Objective or Competency Module Objectives or Competency
The student will be able to recognize database anomalies and explain the significant disadvantages associated with the presence of such anomalies. The student will be able to detect poor design features that can lead to insertion, deletion, and modification anomalies and how they degrade a database.
The student will be able to detect repeating groups and understand their deleterious effect on query execution.
The student will be able to recognize partial dependencies and understand their harmful effect on query execution.
The student will be able to recognize transitive dependencies and understand how they can degrade query execution.
The student will be able to apply the normalization approach to remove database anomalies to improve database functionality. The student will be able to apply the process for eliminating repeating groups in order to convert each relation to First Normal Form.
The student will be able to apply the process for eliminating partial dependencies in order to convert each relation to Second Normal Form.
The student will be able to apply the process for eliminating transitive dependencies in order to convert each relation to Third Normal Form.
The student will be introduced to higher level normal forms like Boyce-Codd Normal Form, Fourth Normal Form, Fifth Normal Form, and Domain/Key Normal Form.
Section 1: Overview

Easy explanation of Normalization






Definition:

Objective:

Justification:

Section 2: Anomalies

Definition:

Types:

There are three kinds of anomalies:




Consider the following relational scheme and the corresponding relation:

STUDENT_COURSE (Bengal_ID, S_Name, Course#, Course_Title, Instr_Name, Instr_Office, Grade)

Anomaly Example.


Insertion Anomaly

Deletion Anomaly

Modification Anomaly

Section 3: Repeating Groups

Definition:




Suppose there is a relation called STUDENT with the following schema:

STUDENT (Bengal_ID, S_Name, S_Major, (S_Minor), S_Credits, S_Class)

Consider the following entries:

Repeating Groups Example.


Summary: Repeating groups result in unpredictable results from the relational algebra commands, and thus must be eliminated.

Not all anomalies can be avoided, but a good design greatly reduces them.

Solution: Split the relational scheme up using the normalization process to avoid anomalies.

STUDENT (Bengal_ID, S_Name, S_Major, S_Credits, S_Class)
STUDENT_MINOR (Bengal_ID, S_Minor)

The normalization process simplifies relations by breaking them up into smaller relations.

To avoid anomalies, each relation should contain information about a single object.

Section 4: Normalization Details (Partiality and Transitivity)

Simpler relations can be derived by a series of mathematical steps known as normalization.

Normalization uses the notion of dependency. The idea is to only include things (attributes) that are related to, or dependent on, the primary key, i.e., group attributes that are functionally dependent.

RECALL: an attribute B of relation R is functionally dependent on attribute A if, at every instant of time, each A-value in R is associated with one and only one B-value.




Partial functional dependency occurs if a non-key attribute is dependent on only part of a composite key.

example:

STUDENT_COURSE (S#, C#, S_NAME, S_ADDR, C_TITLE, GRADE)

Partial dependencies cannot be tolerated because a table that contains such dependencies is subject to data redundancies and, therefore, to update anomalies. The data redundancies are caused by the fact that every row entry requires a duplication of data.

Solution:

STUDENT (S#, S_NAME, S_ADDR)
STUDENT_COURSE (S#, C#, GRADE)
COURSE (C#, C_TITLE)




Transitive dependency: attribute C is transitively dependent on attribute A if there is an attribute B such that A arrow.jpg B and B arrow.jpg C, giving A arrow.jpg C. (Do you remember the Transitive Property of Equality?)

Also called mutual dependency.


example:

STUDENT_MAJOR (S#, MAJOR_DEPT, DEPT_HEAD)

Assume

S# arrow.jpg MAJOR_DEPT

MAJOR_DEPT arrow.jpg DEPT_HEAD

so...

S# arrow.jpg DEPT_HEAD

DEPT_HEAD is dependent on S# directly, or transitively through

S# arrow.jpg MAJOR_DEPT arrow.jpg DEPT_HEAD

Partial and transitive dependencies cause anomalies and must be eliminated.

Relations can be broken down to eliminate transitivity and partiality.

Elimination of Transitivity:

STUDENT_MAJOR (S#, MAJOR_DEPT)
DEPT (MAJOR_DEPT, DEPT_HEAD)

Section 5: Normal Forms

Stages

Normalization can be used to simplify relations.

  1. Stage 1: unnormalized relations with repeating groups
  2. Stage 2: first normal form relations
  3. Stage 3: second normal form relations
  4. Stage 4: third normal form relations

Normal Forms

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)




The normalization process follows the steps below:

normalization process.

Recall the original problem from Section 2:

STUDENT_COURSE (S#, S_NAME, COURSE#, C_TITLE, INSTR_NAME, INSTR_OFFICE, GRADE)

Section 6: Normalization Technique

Converting to 1NF (remove repeating groups):

Converting to 2NF (remove partial dependencies):

Converting to 3NF (remove transitive dependencies):

Section 7: Normalization Example 1

Problem: Repeating groups (assume multiple organizations are handled)

ACCOUNTS_RECEIVABLE (ORG_ID, BILL_ADDR, DATE_ACCT_OPENED,
(INV#, INV_DATE, INV_AMT),
(CHK#, PMT_DATE, PMT_AMT)
OUTSTANDING_BALANCE, #_ORDERS_TO_DATE)




Step 1: Remove repeating groups.

ACCOUNTS_RECEIVABLE (ORG_ID, BILL_ADDR, DATE_ACCT_OPENED,
OUTSTANDING_BALANCE, #_ORDERS_TO_DATE)
INVOICE (INV#, INV_DATE, INV_AMT, ORG_ID)
PAYMENT (CHK#, PMT_DATE, PMT_AMT, ORG_ID)



Decision??

ACCOUNTS_RECEIVABLE (ORG_ID, BILL_ADDR, DATE_ACCT_OPENED,
OUTSTANDING_BALANCE, #_ORDERS_TO_DATE)
INVOICE (INV#, INV_DATE, INV_AMT, ORG_ID)
PAYMENT (CHK#, ORG_ID, PMT_DATE, PMT_AMT)




Step 2: Remove partial dependencies.

ACCOUNTS_RECEIVABLE – OK
INVOICE – OK
PAYMENT – OK




Step 3: Remove transitive dependencies.

ACCOUNTS_RECEIVABLE – OK
INVOICE – OK
PAYMENT – OK

Section 8: Normalization Example 2

Problem: Transitive (mutual) dependencies

BOOK (ISBN, TITLE, AUTHOR, ORG_AFFILIATION, PUBLISHER, PRICE)

Step 1: Remove repeating groups.

BOOK – OK




Step 2: Remove partial dependencies.

BOOK – OK




Step 3: Remove transitive dependencies.

ORG_AFFILIATION depends on AUTHOR, which depends on ISBN.

BOOK (ISBN, TITLE, AUTHOR, PUBLISHER, PRICE)
AUTHOR (AUTHOR, ORG_AFFILIATION)

Section 9: Normalization Example 3

Problem: Repeating groups and partial dependencies.

Consider this problem from the perspective of a hardware store – the store assigns part numbers.

SUPPLIER (S#, S_NAME, S_ADDR, (PART#, P_NAME, P_WT))

Step 1: Remove repeating groups.

SUPPLIER (S#, S_NAME, S_ADDR)
S_PART (S#, PART#, P_NAME, P_WT) * intentional mistake

In the solution above we arbitrarily made S# part of the primary key for the new relation without pondering whether it makes sense. Let's see if it still works out.




Step 2: Remove partial dependencies.

PART info should not depend on S#.

SUPPLIER (S#, S_NAME, S_ADDR)
S_PART (S#, PART#)
PART (PART#, P_NAME, P_WT)




Step 3: Remove transitive dependencies.

SUPPLIER – OK
S_PART – OK
PART – OK

Notice that in the example above S_PART is represented as a composite entity.

Section 10: Normalization Example 4

Problem: Normalizing tables rather than schemas.




Problems:

  1. The project number PROJ_NUM is intended to be a primary key, or at least part of a primary key, but it contains null values.
  2. The table contains data redundancy.
  3. The table entries invite data inconsistencies. For instance, there are many different ways to abbreviate electrical engineer.
  4. The redundancies result in the following anomalies.
    1. update anomalies – modifying JOB_CLASS requires potentially many alterations.
    2. addition anomalies – to add a new project, at least one employee must be assigned to it.
    3. deletion anomalies – if employee 103 quits and is deleted, additional vital data is lost as well.



Step 1: Remove repeating groups.




Step 2: Remove partial dependencies.




Step 3: Remove transitive dependencies.




NOTE: Although the four tables are in 3NF, there is a potential problem:

Section 11: Higher Normal Forms

Third normal form is generally adequate for most situations.

There are higher level normal forms, such as

Heuristic

A commonsense rule (or set of rules) intended to increase the probability of solving some problem.