Section 0: Module Objectives or Competencies
Course Objective or Competency | Module Objectives or Competency |
---|---|
The student will be able to explain the need for and relevance of databases. | The student will be able to justify why it is critical to learn database concepts. |
The student will be able to list and explain the fundamental concepts of a relational database system. | The student will be able to explain what a DBMS is, as well as the advantages of databases. |
The student will be able to explain why database design is an essential skill. | |
The student will be able to list and explain various database models. | |
The student will be able to explain, at a very general level, the theoretical underpinnings of relational databases. | |
The student will be able to list some of the features of an RDBMS. |
Section 1: Why we should care about Database
Software Industry Needs
The software industry is a data-driven environment, and many IT companies are dependent on data. Here are a few examples:
Databases are incredibly prevalent – they underlie technology used by most people every day if not every hour.
-
Databases reside behind a huge fraction of websites.
- Dynamic and e-commerce websites use databases to store the content that is loaded into a web page when the page is requested by the user. Social networks such as Facebook rely heavily on backend databases to store and form relationships from the information its users enter.
-
Databases are a crucial component of telecommunications systems, banking systems, video games, and just about any other software system or electronic device that maintains some amount of persistent information.
- Libraries rely on a database to store information about their books, members and borrowings.
- Almost all schools will use an Information Management System to store information about students, teachers, classes, grades and attendance etc. These require a database to function.
- Supermarkets need a database to store information about products and sales. Loyalty card schemes require a very large database of their own.
Database systems provide a number of other properties that make them exceptionally useful and convenient:
- efficiency
- scalability
- concurrency control
- data abstractions
- high-level query languages
Curriculum Requirements
Databases are so ubiquitous and important that CS and Information Systems graduates frequently cite their database class as the one most useful to them in their industry or graduate-school careers.
Database programming classes are typically part of a computer science, information technology, information systems, and informatics degree programs.
- The Curriculum Guidelines for Undergraduate Degree Programs in Information Systems is built around a Core Information Systems Body of Knowledge organized into a set of 16 Knowledge Areas (KAs), corresponding to topical areas of study in Information Systems.
- Data and Information Management
- Database Management Systems
- Data and Information Modeling at Conceptual and Logical Levels
- Physical Database Implementation
- Data Retrieval and Manipulation with Database Languages
- Data Management and Transaction Processing
- Distributed Databases
- Data Integrity and Quality
- Data and Database Administration
- The Curriculum Guidelines for Undergraduate Degree Programs in Computer Science is built around a Body of Knowledge organized into a set of 18 Knowledge Areas (KAs), corresponding to topical areas of study in computing.
- IM-Information Management
- IM/Database Systems
- IM/Relational Databases
- IM/Distributed Databases
- IM/Physical Database Design
- The Curriculum Guidelines for Baccalaureate Degree Programs in Information Technology is built around a set of 14 Domains, which collectively represent the scope of IT.
- ITE-IMA Information Management
- ITE-IMA-02 Data-information concepts
- ITE-IMA-03 Data modeling
- ITE-IMA-04 Database query languages
- ITE-IMA-05 Data organization architecture
- ITE-IMA-06 Special-purpose databases
- ITE-IMA-07 Managing the database environment
Job Market
Many jobs require that a developer have some database experience.
- Not having that experience means those jobs are eliminated from your list of potential opportunities.
- According to some job sites, a backend developer costs 35% more than a front-end developer. A backend developer almost certainly will require database knowledge, and definitely will need some type of data persistence skillset (NoSQL).
- Developers who have skills with advanced datastores, such as Hadoop, make even more money.
- Jobs in the emerging Data Science/Data Analytics sector require advanced database expertise.
Section 2: DBMS Overview
Database Management System (DBMS)
A DBMS is a collection of programs that manages the database structure and controls access to the data stored in the database.
Examples include Oracle, DB2, SQL Server, MySQL, PostgreSQL, and Ingres.
Advantages of Databases
Unlike file systems, databases offer the following advantages:
- Structural independence – a modification in the structure of the data does not impact programs that utilize that data.
- Data independence – changes in data characteristics, such as changing a field from integer to real, do not require changes in the programs that utilize the data; data is kept independent of the programs that use it.
- Reduced data redundancy – duplication of data in multiple locations is greatly reduced.
- Data consistency – data is reliable and uniform across applications. Further, any data written to the database must be valid according to all defined rules.
- No data anomalies – data is consistent and changes to any data item need be made in one place only. (Anomalies are problems that can occur in poorly designed databases such as an inconsistency between one part of the database and another.)
- Data integrity – ensures the accuracy and consistency of data stored in a database over its entire life-cycle; data is recorded exactly as intended and does not experience unintentional changes
Databases promote and enforce data integrity, which is a fundamental component of information security.
Database design
Database design is the design of the database structure that will be used to manage data.
Effective database design is critical because the data structures created within the database and the extent of the relationships among them are often the determining factor in the effectiveness of the DBMS.
- Even a good DBMS will perform poorly with a badly designed database.
-
A poorly designed database encourages redundant data, i.e.,
unnecessarily duplicated data, which often makes it difficult to
trace errors.
- Redundant data refers to the condition in which the same data are kept in different locations for the same entity (a person, place, thing, or event for which data is collected and stored.)
- example: Customer telephone numbers are stored in a customer file, a sales agent file, and an invoice file. If a number is changed then the possibility exists that at least one occurrence of the number will be overlooked, resulting in incorrect information in the database.
Database design is simplified through models.
- A model is a simplified abstraction of real-world events or conditions that enable people to explore the characteristics of entities and the relationships that may be created between such entities.
- A database model is a collection of logical constructs used to represent the data structure and the data relationships found within the database.
-
Database models are grouped into two categories, conceptual and
implementation.
-
Conceptual models focus on the logical nature of data
representation, i.e., what is represented in the
database rather than how it is represented.
- The Entity-Relationship Model is one example of a conceptual model.
-
Implementation models place the emphasis on how the data
are to be represented in the database or on how the data
structures are implemented to represent what is modeled.
- Hierarchical database models, network database models, and relational database models are examples of implementation models.
-
Conceptual models focus on the logical nature of data
representation, i.e., what is represented in the
database rather than how it is represented.
Section 3: Database Models
A database model is the theoretical foundation of a database and fundamentally determines in which manner data can be stored, organized and manipulated in a database system.
- It thereby defines the infrastructure offered by a particular database system.
Each new database model capitalizes on the shortcomings of previous models.

Database Model Evolution
Common Characteristics
- Conceptual simplicity without compromising the semantic completeness of the database.
- Represent the real world as closely as possible.
- Representation of real-world transformations (behavior) must be in compliance with consistency and integrity characteristics of any data model.
Hierarchical
- Based on a hierarchy or tree structure.
- Assumes that many smaller parts come together as components of still larger components.

Hierarchical Database Model
Network
- Created to represent complex data relationships more effectively.

Network Database Model
Relational
-
Based on entities and the relationships between
them via a common entity characteristic.
- A relationship results from using controlled redundancy to share a common entity characteristic among multiple tables.
- This is the most popular database model.

Relational Database Model
Object-oriented
- Models both data and their relationships in a single structure known as an object; i.e., based on OO concepts.

Object-oriented Database Model
Object/relational
- Incorporates features borrowed from object-oriented database systems into relational systems.

Object/relational Database Model
NoSQL
- NoSQL means Not Only SQL, implying that when designing a software solution or product, there is more than one storage mechanism that could be used based on the needs.
-
NoSQL databases
can broadly be categorized in four types:
- Key-Value databases - Some of the popular key-value databases are Riak, Redis (often referred to as Data Structure server), Memcached and its flavors, Berkeley DB, upscaledb (especially suited for embedded use), Amazon DynamoDB (not open-source), Project Voldemort and Couchbase.
- Document databases - Some of the popular document databases we have seen are MongoDB, CouchDB , Terrastore, OrientDB, and RavenDB.
- Column family stores - Cassandra is one of the popular column-family databases; there are others, such as HBase, Hypertable, and Amazon DynamoDB.
- Graph Databases - There are many graph databases available, such as Neo4J, Infinite Graph, OrientDB, or FlockDB.

NoSQL Key-Value Database Model
Section 4: Relational Database Model
- The relational database is perceived by the user as a collection of tables in which data are stored.
- Each table consists of a series of row/column intersections.
- Tables, or relations, are related to each other by sharing a common entity characteristic.

Relational Model
- A CUSTOMER table may contain a sales representative's number that corresponds to an entry in the SALESREP table. This common link enables the customer to be matched to his sales rep without duplicating information.
-
The relational model uses controlled
redundancy to avoid the widespread redundancies found in file
systems.
- Controlled redundancy is a technique that allows relationships to be created by sharing a common entity characteristic among multiple tables.
- A relational table stores a collection of related entities.
-
Advantages:
- It provides complete data and structural independence because it is a purely logical structure. Actual physical data storage characteristics are not a consideration in design.
- SQL is a 4GL (4th generation language) capable of translating user requests into the technical details required to retrieve the requested data.
-
Disadvantages:
- Requires substantial operating system and hardware overhead.
- Slower than other database systems. (Improper design can slow the system down and produce the data anomalies found in file systems.)
Section 5: Database Features
A Database Management System (DBMS) offers the following features:
- Data Dictionary – stores a variety of data characteristics such as length and type (metadata). This helps to eliminate structural and data dependencies.
- Data Storage Management – creates the complex structures required for data storage, helping the user to "disregard" physical data characteristics.
- Data Transformation – transforms the entered data so that it conforms to the data structures; i.e., it transforms the logical requests into commands that physically locate and retrieve the requested data.
- Security – enforces security and privacy within the database.
- Multi-user Access Control – allows multiple user access to the data.
- Backup and Recovery – provides backup and data recovery procedures to ensure data safety and integrity.
- Data Integrity – promotes and enforces integrity rules to eliminate data integrity problems, thus minimizing data redundancy and maximizing data consistency.
- Database Access Languages and Application Programming Interfaces – provide data access via a query language (i.e., a nonprocedural language such as SQL that lets the user specify what must be done without specifying how it is to be done.)
- Database Communications Interfaces – allow the database to accept end-user requests within a network environment.