Related Specializations and Techniques



index
Disabled back buttonNext Section
printable version




Section 0: Module Objectives or Competencies
Course Objective or Competency Module Objectives or Competency
The student will be introduced to related specializations and techniques in the areas of data analytics and database. The student will be able to explain specialization in the field of data analytics such as data integration, data migration, and data visualization.
The student will be able to explain database related specializations such as database testing, database deployment, and database performance tuning.


Section 1: Overview

There are additional topics related to the content of this course with which students should become familiar.

Those related to data analytics will be addressed first, followed by those associated with databases.



Section 2: The Realm of Data Analytics

Data Science

Data science involves the application of statistics, machine learning, and analytical approaches to solve critical business problems.


Data Engineering

Data engineering requires the knowledge and skills to prepare the data infrastructure to be analyzed by data scientists, software engineers who design, build, integrate data from various resources, and manage big data.

Then, with the goal of optimizing the performance of their company’s big data ecosystem, they write complex queries to ensure that the data is easily accessible.


Data Analytics Infrastructure

An optimal data analytics infrastructure makes it possible to analyze vast amounts of data in parallel across multiple clusters.


Data Analytics Administration

The primary responsibility of a data analytics administrator involves the configuration, security, and maintenance of the data analytics/business intelligence platform at the enterprise which includes administration of the infrastructure as well as configuring, managing and maintaining the platform.


Hadoop Administrator

A Hadoop administrator administers and manages Hadoop clusters and all other resources in the entire Hadoop ecosystem.

The typical responsibilities of a Hadoop admin include deploying a Hadoop cluster, maintaining a Hadoop cluster, adding and removing nodes using cluster monitoring tools like Ganglia Nagios or Cloudera Manager, configuring the NameNode high availability and keeping a track of all the running Hadoop jobs, implementing, managing and administering the overall Hadoop infrastructure, taking care of the day-to-day running of Hadoop clusters, and working closely with the database team, network team, BI team and application teams to make sure that all the big data applications are highly available and performing as expected.


Data Ingestion

Data ingestion gathers data and brings it into a data processing system where it can be stored, analyzed, and accessed.


Data Integration

Data integration is the process of combining data residing at different sources and providing a unified view, that is, the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information.


Data Migration

Data migration consists of transferring data stored from a source system to a destination one without affecting operations.


Data Uploading

Uploading data involves transmitting data from one computer system to another, typically to make it available to another computer system due to system upgrade, integration with another system, or migration from another system.


ETL

ETL is a type of data integration that refers to the three steps (extract, transform, load) used to blend data from multiple sources into structured, organized datasets, which is also known as a data pipeline, and serves as the foundation of many Business Intelligence solutions.


Data Extraction

Data extraction is the process of obtaining data from a source for further data processing, storage, or analysis elsewhere.


Data Transformation

Data transformation is the process of translating data from one format to another.

The goal of data transformation is to prevent data loss or corruption by maintaining the integrity of the data and embedded structures.


Data Loading

Data loading refers to the "load" component of ETL.


Data Cleansing

Data cleansing is the process of ensuring that data is correct, consistent and useable by identifying any errors or corruptions in the data, correcting or deleting them, or manually processing them as needed to prevent the error from happening again.


Data Validation

Data validation is a method for checking the accuracy and quality of data, typically performed prior to importing and processing.


Data Integrity

Data integrity refers to the maintenance of, and the assurance of the accuracy and consistency of data over its entire life-cycle by protecting data against improper maintenance, modification, or alteration as well as ensuring data authenticity.


Data Profiling

Data profiling involves examining, analyzing and reviewing data to gather statistics surrounding the quality and hygiene of the dataset.


Data Analytics Deployment

Data analytics deployment ensures that the final solution is ready to be used within the operational environment and that end users have all the required tools to act upon the analytical insights.


Data Visualization

Data visualization is the presentation of data in a pictorial or graphical format to help people understand the significance of the data.



Section 3: The Realm of Database

Database Testing

Database testing is the process of validating that the metadata (structure) and data stored in the database meets the requirement and design, and is a critical component of quality assurance.


Database Deployment

Database deployment includes all of the steps, processes, and activities that are required to make a database or update available to its intended users.


Database Monitoring

Database monitoring involves tracking database performance and resources in order to create and maintain high performance and a highly available application infrastructure.


Database Performance Tuning

Database performance tuning encompasses steps to optimize performance with the goal of maximizing the use of system resources for greater efficiency.



Section 4: Closing

The most enchanting aspect of a career in Information Systems or Computer Science is the opportunity to continue learning throughout your life, as no field changes more quickly or inexorably.

The most daunting aspect of a career in Information Systems or Computer Science is the responsibility and the requirement to learn and adapt as these changes take place.