Relational DBMSs

Section 0: Module Objectives or Competencies

Course Objective or Competency	Module Objectives or Competency
The student will be able to list and explain the terminology, concepts, and basis of relational databases.	The student will be able to list and explain database terminology like entities, attributes, and tables or entity sets.
	The student will be able to explain the functional dependencies and how the concept is used to determine how a primary key is selected to uniquely identify all other attribute values in any given row.
	The student will be able to design foreign key to primary key linkages between two database tables.
	The student will be able to demonstrate the use of entity integrity and referential integrity to ensure that the data in a database is accurate, valid, reliable, and consistent.
	The student will be able to explain how relationship classifications factor into database design.
	The student will be able to explain what a system catalog is and how it is vital to a relational database.

Section 1: Overview

Database Fundamentals

The design of relational databases focuses on entities and their attributes.

An entity is a person, place, thing, or event for which data is collected and stored.
For example, a university system has a student entity.

An attribute represents a characteristic of an entity.

For example, the student entity might have attributes student number, name, GPA, etc.

Tables

Recall that a relational database is perceived by the user as a collection of tables in which data are stored.

Each table consists of a series of row/column intersections.

Table Details

A table contains a group of related entities.

Columns store attributes.
Rows store individual entities. Rows are generally referred to as tuples.
Each row/column intersection contains the value of one of the attributes for a particular entity.
- For example, in the table above the table cell at the intersection of the sixth column and the third row contains the value for the phone number associated with agent 503 (John Okon).

Since a table contains a group of related entities it is sometimes called an entity set, or a relation.

Remember that each row, or tuple, represents a single entity. In the table above each tuple represents a single agent out of the entire group of agents.

Each row (or tuple) must have a primary key, i.e., an attribute that uniquely identifies any given entity (row).
In the table above each agent has a unique agent code.
There will be more on keys later.

Each column, or attribute, must have a unique name.

The column's set of permissible values is known as its domain. The domain for GRADE is [A, B, C, D, F].

The order of the rows and columns is unimportant.

Section 2: Dependency and Keys

Functional Dependence

Functional dependence means that the value in column A determines the value in column B (the value in column B depends on some value in column A).

In plain English, if you know the value of A you can determine the value of B.

Here is an example:

studentID determines the student last name (lastName), i.e., lastName is functionally dependent on studentID.
- studentID → lastName
studentID is not functionally dependent on lastName, because multiple students may be named "Smith."
Like lastName, all other attributes are dependent on studentID.
The social security number (ssn) also functionally determines the other attributes.

Fully Functionally Dependent

If the attribute (B) is functionally dependent on a composite key (A) but not on any subset of that composite key, the attribute (B) is fully functionally dependent on (A).

In the following example the attributes are fully functionally dependent on the composite key:
lastName, firstName, initial, phone → hrs, class, major,...
In the following example the attributes are NOT fully functionally dependent on the composite key:
studentID, lastName, firstName, initial, phone → hrs, class, major,...

Keys

There are several types of keys.

When designing a table, or entity set, the designer must identify a primary key that will uniquely identify each entity.

The designer begins by identifying viable superkeys, then from those selecting candidate keys, and from those selecting a primary key.

Superkey

A superkey is an attribute (or combination of attributes) that uniquely identifies each entity in a table.

studentID
studentID, lastName
studentID, lastName, firstName, initial

Candidate Key

A candidate key is a minimal superkey, i.e., it does not contain a subset of attributes that is itself a superkey.

studentID – okay
ssn – okay
studentID, lastName – NOT okay

Primary Key

A primary key is a candidate key selected to uniquely identify all other attribute values in any given row.

studentID
Why not choose ssn? Recall business rules...

Composite Key

It may require more than a single attribute to define functional dependence, i.e., a key may be composed of more than one attribute.

Such multi-attribute keys are referred to as composite keys.
courseID, studentID → courseGrade

Secondary Key

A secondary key is an attribute (or combination of attributes) used strictly for data retrieval purposes.

lastName, phone

Foreign Key

A foreign key is the primary key of one table that has been placed into another table to create a common attribute by which they can be linked..

More on this in the next section on linking tables.

Section 3: Linking Tables

The ability to link, or JOIN, tables is the power behind relational databases.

Tables, or relations, are related to each other by sharing a common entity characteristic.

Linking Tables

Links are created by having two tables share a common field.

The primary key of one table appears as the link (foreign key) in another table.
In the tables above, AGENT_CODE is the primary key in the AGENT table, and a foreign key in the CUSTOMER table. The AGENT table has no foreign key.

Controlled Redundancy

Controlled redundancy refers to the way in which tables in the database share common attributes that enable the tables to be linked together.

One of the advantages of a well-designed database is that it avoids redundancy, which exists when fields are duplicated in multiple locations.
Controlled redundancy means that redundancy is present, but it is controlled because it is limited to foreign key fields only.
It is necessary to link tables.

In the table above, the CUSTOMER table may contain an agent's AGENT_CODE that corresponds to an entry in the AGENT table.

This common link enables the customer to be matched to his agent without duplicating information.

The relational model uses controlled redundancy to avoid the widespread redundancies found in file systems.

Section 4: Integrity Rules

Database integrity ensures that data entered into the database is accurate, valid, reliable, and consistent.

Integrity rules are the rules that must be enforced to ensure the accuracy and accessibility of a relational database.

These rules govern which operations can be performed on the data and on the structure of the database.

Integrity rules are imperative to a good database design. Most RDBMS have these rules automatically, but it is safer to apply the rules in the design phase.

Entity Integrity

Requirements – no null entries in a primary key; all entries are unique.

Purpose – guarantees that each entity will have a unique identity.

Example – No invoice can have a duplicate number, nor can it be null. (All invoices are uniquely identified by their invoice number.)

Consequences of violating entity integrity: If multiple tuples have duplicate or unspecified primary key values, it might not be possible to locate data for a specific entity, leading to data conflict between tuples, and joins to another table via identical values in their common attribute would result in ambiguous linkages to possibly unrelated tuples.

Entity integrity guarantees that every row of a table is accessible, whether data is being retrieved or modified.

Referential Integrity

Requirements – foreign key must have either a null entry or an entry that matches the primary key value in a table to which it is related.

Purpose – makes it impossible for an attribute to have an invalid entry, although it is possible for the attribute NOT to have a corresponding value.

The enforcement of referential integrity makes it impossible to delete a row in one table whose primary key has mandatory matching foreign key values in another table.

Example – A customer might not yet have an assigned agent (AGENT_CODE), but it will be impossible to have an invalid agent (AGENT_CODE).

Consequences of violating referential integrity: If a tuple in table 2 has a foreign key for which there is no corresponding primary key in related table 1, the relation to which a foreign key refers does not exist and joining the tables together would cause tuples in table 2 with the unmatched foreign key to be left out of the results.

Referential integrity ensures that a value in one table references an existing value in another table..

Illustration

Illustration of Database Integrity

Entity Integrity

The CUSTOMER table's primary key is CUS_CODE, with unique entries and no null entries.
The AGENT table's primary key is AGENT_CODE, and this primary key column is also free of null entries.

Referential Integrity

The CUSTOMER table contains a foreign key AGENT_CODE, which links entries in the CUSTOMER table to the AGENT table.
The CUS_CODE row identified by the (primary key) number 10013 contains a null entry in its AGENT_CODE foreign key because Paul F. Olowski does not (currently) have an agent assigned to him.
The remaining CUSTOMER table's AGENT_CODE entries all match the AGENT_CODE entries in the AGENT table.

Section 5: Relationships

A relationship describes an association among entities.

Relationship classifications include the following:

One-to-one (1:1) – one entity can be related to only one other entity, and vice-versa.
- example: DEAN to SCHOOL
- The existence of a 1:1 relationship often means that the entity components were defined improperly, and in fact may belong in the same table.
- A 1:1 relationship is sometimes necessary to represent a supertype.
- A 1:1 relationship may also be needed to represent a recursive relationship.

One-to-Many (1:M) – one entity (or tuple) in a table can be related to many other entities in another table. A 1:M relationship is the relational norm or ideal.
- example: PROFESSOR to STUDENT
- example: PAINTER and PAINTING
- Each painting is painted by only one painter, but each painter may have painted many paintings.
- There may be only one row in the PAINTER table for any given row in the PAINTING table, but there may be many rows in the PAINTING table for any given row in the PAINTER table.
- The 1:M relationship is easily implemented in the relational model "as long as the primary key of the one is included as the foreign key in the table of the many.

Many-to-Many (M:N) – In a M:N relationship, one or more rows in a table can be related to 0, 1 or many rows in another table.
- example: STUDENT to CLASS
- Each CLASS may have many STUDENTs, and each STUDENT may take many CLASSes.
- There may be many rows in the CLASS table for any given row in the STUDENT table, and there may also be many rows in the STUDENT table for any given row in the CLASS table.
- Here is an incorrect approach to implementing a M:N relationship:
  - The tables contain duplicated data, wasting space and opening the door to anomalies.
  - The system will be very inefficient.
- A many-to-many relationship (M:N) can best be implemented by breaking it up to produce a set of 1:M relationships.
- The solution is a composite entity, which is an entity whose primary key is composed of a combination of the primary keys of the entities that must be linked.
- The ENROLL table can be created to link the tables CLASS and STUDENT.
- A linking table is the embodiment of a composite entity.
- The linking table produces the desired 1:M relationships.
- STUDENT and CLASS tables now contain one row per entity.
- Only the linking table contains redundancies.
- The linking table (composite entity) must contain at least the primary keys of the two entities being linked, although additional attributes may be assigned as needed.

Section 6: System Catalog

A data dictionary provides a detailed accounting of all tables found within the database.

It contains all the attribute names and characteristics for each of the tables in the system.
In other words, it contains metadata, or data about data.

System Catalog

The system catalog is a very detailed system data dictionary that describes all objects within the database, including:

data about table names
when and by whom the tables were created
the number of columns in each table
the data type of each column
index file names
authorized users
access privileges
etc.

Notes:

Current RDBMSs generally provide only a system catalog, from which the data dictionary may be derived.
The system catalog is a system-created database whose tables store user/designer-created database characteristics and contents.
The terms system catalog and data dictionary are often used interchangeably.
The system catalog automatically produces database documentation.
Such documentation makes it possible for the RDBMS to check for and eliminate homonyms (same attribute name for different attributes: C_NAME for both customer name and company name) and synonyms (use of different names to describe the same attribute: C_NAME and CUS_NAME for customer).

A Sample Data Dictionary

Read more about the importance of system catalogs.

Section 7: Database Software

Best Relational Databases Software in 2020

The latest version of the MySQL DBMS is MySQL 8.0.30.
The latest version of the MariaDB DBMS is MariaDB 10.6.2.