Data Dictionary

Section 0: Module Objectives or Competencies

Course Objective or Competency	Module Objectives or Competency
The student will be able to explain what is included in data dictionaries, and why they are relevant to organizations.	The student will be able to explain the purpose or function of data dictionaries.
	The student will be able to explain how metadata is critical to an organization.
	The student will be able to list and explain the items that are included in a data dictionary.

Section 1: Overview

Definition

a repository where data and processes are defined
contains metadata
- data about data

At various times you will hear that data dictionaries are unimportant, or they are obsolete. Wrong!

From Health Language Blog: What is a Data Dictionary and What Role Does it Play in Semantic Interoperability?:

A data dictionary is a set of information describing what type of data is collected within a database, its format, structure, and how the data is used. In many respects, a data dictionary can be thought of as the rules in which all the data within your system need to abide by. If all of your systems are producing data that follow the same rules - you achieve semantic interoperability.

The dictionary can provide a list of names, definitions and data elements to be captured in the system and includes metadata—or additional information—about each of those elements. Metadata is a way to organize data at its most basic level and helps in distilling large amounts of data for specific purposes. The use of metadata will become increasingly important as large volumes of information become available from the increased use of HIE [Health information exchange] systems like EHRs. So much new information would have little value if it couldn't be processed and analyzed dependably.

Data dictionaries should be created with federal standards to support HIE with Meaningful Use in mind. As HIE use increases, AHIMA warns that healthcare organizations will need to properly identify data elements for appropriate reporting and transmission.

A successful data dictionary can improve the reliability and dependability of an organization's data, reduce redundancy, improve documentation and control, and make it easier to analyze data and use it to make evidence-based care decisions like those common in accountable care organizations.

Justification

systems analysts must be aware of and catalog all of the different terms that refer to the same data item
helps to avoid duplication of effort
allows better communication between organizational departments sharing a database
makes maintenance more straightforward
provides a standard for data elements
makes it possible to check for and eliminate homonyms and synonyms
- homonym
  - same name for different items
  - Example: C_NAME for both customer name and company name
- synonym
  - use of different names to describe the same item
  - Example: C_NAME and CUS_NAME for customer

Uses

can be used to validate the data flow diagram for completeness and accuracy
provides a starting point for developing screens and reports
helps to determine the contents of data stored in files
helps to develop the logic for data flow diagram processes

Video Explanation

Here's a video for another perspective.

Video: What is a data dictionary?

Health Care?

In recent years, the American Recovery and Reinvestment Act, numerous health IT initiatives, and the growth of health information exchange (HIE) have increased the healthcare industry's focus on data management and data use. Currently many organizations store data in multiple health information systems that are disparate—meaning the data within each system stand alone and are not interoperable.

Accurate and reliable data are integral to the many health IT initiatives currently under way. According to the International Organization for Standardization (ISO):

The increased use of data processing and electronic data interchange heavily relies on accurate, reliable, controllable, and verifiable data recorded in databases. One of the prerequisites for a correct and proper use and interpretation of data is that both users and owners of data have a common understanding of the meaning and descriptive characteristics (e.g., representation) of that data. To guarantee this shared view, a number of basic attributes has to be defined.1

A data dictionary is one tool organizations can use to help ensure data accuracy.

From American Health Information Management Association's Managing a Data Dictionary.

Section 2: Process of Developing the System Dictionary

often automated by CASE (Computer Assisted Software Engineering) tool
begin by examining and describing the contents of the data flows, data stores, and processes
each data store and data flow should be defined and then expanded to include the details of the elements it contains
the logic of each process should be described using the data flowing in or out of the process
omissions and other design errors should be noted and resolved
may be created after the data flow diagram has been completed OR
may (should) be constructed as the DFD is being developed

Section 3: Data Dictionary Categories

The data dictionary acts as a repository for the definitions of data flows, data stores, data structures, and data elements. We're already familiar with data flows and data stores, so...

Data element

basic data item which, when further decomposed, has little informational value
elemental items of information, which when considered alone, are useless
examples: date, phone number, SSN
Better examples: year (in date), area code (in phone number), area number or group number (in SSN)

Data structure

group of one or more related data elements and/or data structures
examples: customer_address

Section 4: Defining Data Flows

Data flows for all input and output should be described first, followed by the internal data flows and the flows to and from data stores.

ID: optional identification number
unique descriptive name: the text that appears on the diagram and referenced in all descriptions using the data flow
general description of data flow
source of data flow: could be external entity, process, data store
destination of the data flow: could be external entity, process, data store
type of data flow: tells if flow is
- a record entering or leaving a file
- a record contained in a report
- a record contained in a form
- a record contained in a screen
- an internal flow (a flow between processes)
name of data structure or data elements from which flow elements are derived
volume per unit of time: records per day or other units of time

Example of Data Flow

Section 5: Defining Data Structures

This provides a view of the elements that make up the data structure, and is often described using algebraic notation. Data structures are necessary whenever multiple data elements must be grouped together to adequately convey a single piece of information.

Structure ID: optional identification number
structure name: the text that appears on the diagram and referenced in all descriptions using the data structure (attach "-S" to indicate structure)
content: the data elements or structural records that make up the data structure
used in: indicates which data flows or data stores use the data structure
comments or notations

Example of Data Structure

Section 6: Defining Data Elements

Each data element should be defined only once in the data dictionary.

Element ID: optional identification number
element name: unique and descriptive; derived from user environment
aliases: synonyms or other names for the element, such as names used for the same element within different systems such as CUSTOMER NUMBER and CLIENT NUMBER
general description of data element
base or derived element: base elements are entered into the system, derived elements are the result of calculations
element length: stored length
element type: numeric, date, alphabetic, alphanumeric
input and output format: edit mask indicating how data is presented
validation criteria: range of values, possible values, code list
default values: value that the element assumes if no other is supplied
used by: data structures and data stores in which the element is used

Example of Data Element

Section 7: Defining Data Stores

Data stores are created for each different data entity being stored. A data store is required whenever data elements and data structures are grouped together to form a structural record that must be retained for some period of time. A data store is created for each unique structural record.

ID: optional identification number
data store name: should be unique and descriptive
alias: synonyms or other names for the data store
general description of data store
file type: manual or computerized
file format: flat file or database file
size: maximum or average number of records
file name: name of actual file, if known
content: data elements and data structures that make up the data store
data flows in and out

Example of Data Store

Section 8: Benefits

The ideal data dictionary is automated, interactive, and evolutionary.
As the analyst learns about the organization's systems, data are added to the data dictionary.
Automated data dictionaries allow dramatic improvements in the upkeep of documentation.
If begun early, a data dictionary can save many hours of time in the analysis and design phases (especially in large projects).
The data dictionary is one common source in the organization for answering questions and settling disputes about any aspect of data definition.
A current data dictionary can serve as an excellent reference for maintenance efforts on unfamiliar systems.
Automated data dictionaries can serve as references for both people and programs.