Section 0: Module Objectives or Competencies
Course Objective or Competency | Module Objectives or Competency |
---|---|
The student will be able to explain what is included in data dictionaries, and why they are relevant to organizations. | The student will be able to explain the purpose or function of data dictionaries. |
The student will be able to explain how metadata is critical to an organization. | |
The student will be able to list and explain the items that are included in a data dictionary. |
Section 1: Overview
Definition
- a repository where data and processes are defined
- contains metadata
- data about data
At various times you will hear that data dictionaries are unimportant, or they are obsolete. Wrong!
A data dictionary is a set of information describing what type of data is collected within a database, its format, structure, and how the data is used. In many respects, a data dictionary can be thought of as the rules in which all the data within your system need to abide by. If all of your systems are producing data that follow the same rules - you achieve semantic interoperability.
The dictionary can provide a list of names, definitions and data elements to be captured in the system and includes metadata—or additional information—about each of those elements. Metadata is a way to organize data at its most basic level and helps in distilling large amounts of data for specific purposes. The use of metadata will become increasingly important as large volumes of information become available from the increased use of HIE [Health information exchange] systems like EHRs. So much new information would have little value if it couldn't be processed and analyzed dependably.
Data dictionaries should be created with federal standards to support HIE with Meaningful Use in mind. As HIE use increases, AHIMA warns that healthcare organizations will need to properly identify data elements for appropriate reporting and transmission.
A successful data dictionary can improve the reliability and dependability of an organization's data, reduce redundancy, improve documentation and control, and make it easier to analyze data and use it to make evidence-based care decisions like those common in accountable care organizations.
Justification
- systems analysts must be aware of and catalog all of the different terms that refer to the same data item
- helps to avoid duplication of effort
- allows better communication between organizational departments sharing a database
- makes maintenance more straightforward
- provides a standard for data elements
- makes it possible to check for and eliminate homonyms and
synonyms
- homonym
- same name for different items
- Example: C_NAME for both customer name and company name
- synonym
- use of different names to describe the same item
- Example: C_NAME and CUS_NAME for customer
- homonym
Uses
- can be used to validate the data flow diagram for completeness and accuracy
- provides a starting point for developing screens and reports
- helps to determine the contents of data stored in files
- helps to develop the logic for data flow diagram processes
Video Explanation
Here's a video for another perspective.
Health Care?
In recent years, the American Recovery and Reinvestment Act, numerous health IT initiatives, and the growth of health information exchange (HIE) have increased the healthcare industry's focus on data management and data use. Currently many organizations store data in multiple health information systems that are disparate—meaning the data within each system stand alone and are not interoperable.
Accurate and reliable data are integral to the many health IT initiatives currently under way. According to the International Organization for Standardization (ISO):
The increased use of data processing and electronic data interchange heavily relies on accurate, reliable, controllable, and verifiable data recorded in databases. One of the prerequisites for a correct and proper use and interpretation of data is that both users and owners of data have a common understanding of the meaning and descriptive characteristics (e.g., representation) of that data. To guarantee this shared view, a number of basic attributes has to be defined.1
A data dictionary is one tool organizations can use to help ensure data accuracy.
From American Health Information Management Association's Managing a Data Dictionary.
Section 2: Process of Developing the System Dictionary
- often automated by CASE (Computer Assisted Software Engineering) tool
- begin by examining and describing the contents of the data flows, data stores, and processes
- each data store and data flow should be defined and then expanded to include the details of the elements it contains
- the logic of each process should be described using the data flowing in or out of the process
- omissions and other design errors should be noted and resolved
- may be created after the data flow diagram has been completed OR
- may (should) be constructed as the DFD is being developed
Section 3: Data Dictionary Categories
The data dictionary acts as a repository for the definitions of data flows, data stores, data structures, and data elements. We're already familiar with data flows and data stores, so...
Data element
- basic data item which, when further decomposed, has little informational value
- elemental items of information, which when considered alone, are useless
- examples: date, phone number, SSN
- Better examples: year (in date), area code (in phone number), area number or group number (in SSN)
Data structure
- group of one or more related data elements and/or data structures
- examples: customer_address
Section 4: Defining Data Flows
Data flows for all input and output should be described first, followed by the internal data flows and the flows to and from data stores.
- ID: optional identification number
- unique descriptive name: the text that appears on the diagram and referenced in all descriptions using the data flow
- general description of data flow
- source of data flow: could be external entity, process, data store
- destination of the data flow: could be external entity, process, data store
- type of data flow: tells if flow is
- a record entering or leaving a file
- a record contained in a report
- a record contained in a form
- a record contained in a screen
- an internal flow (a flow between processes)
- name of data structure or data elements from which flow elements are derived
- volume per unit of time: records per day or other units of time

Example of Data Flow
Section 5: Defining Data Structures
This provides a view of the elements that make up the data structure, and is often described using algebraic notation. Data structures are necessary whenever multiple data elements must be grouped together to adequately convey a single piece of information.
- Structure ID: optional identification number
- structure name: the text that appears on the diagram and referenced in all descriptions using the data structure (attach "-S" to indicate structure)
- content: the data elements or structural records that make up the data structure
- used in: indicates which data flows or data stores use the data structure
- comments or notations

Example of Data Structure
Section 6: Defining Data Elements
Each data element should be defined only once in the data dictionary.
- Element ID: optional identification number
- element name: unique and descriptive; derived from user environment
- aliases: synonyms or other names for the element, such as names used for the same element within different systems such as CUSTOMER NUMBER and CLIENT NUMBER
- general description of data element
- base or derived element: base elements are entered into the system, derived elements are the result of calculations
- element length: stored length
- element type: numeric, date, alphabetic, alphanumeric
- input and output format: edit mask indicating how data is presented
- validation criteria: range of values, possible values, code list
- default values: value that the element assumes if no other is supplied
- used by: data structures and data stores in which the element is used

Example of Data Element
Section 7: Defining Data Stores
Data stores are created for each different data entity being stored. A data store is required whenever data elements and data structures are grouped together to form a structural record that must be retained for some period of time. A data store is created for each unique structural record.
- ID: optional identification number
- data store name: should be unique and descriptive
- alias: synonyms or other names for the data store
- general description of data store
- file type: manual or computerized
- file format: flat file or database file
- size: maximum or average number of records
- file name: name of actual file, if known
- content: data elements and data structures that make up the data store
- data flows in and out

Example of Data Store
Section 8: Benefits
- The ideal data dictionary is automated, interactive, and evolutionary.
- As the analyst learns about the organization's systems, data are added to the data dictionary.
- Automated data dictionaries allow dramatic improvements in the upkeep of documentation.
- If begun early, a data dictionary can save many hours of time in the analysis and design phases (especially in large projects).
- The data dictionary is one common source in the organization for answering questions and settling disputes about any aspect of data definition.
- A current data dictionary can serve as an excellent reference for maintenance efforts on unfamiliar systems.
- Automated data dictionaries can serve as references for both people and programs.