Download as rtf, pdf, or txt
Download as rtf, pdf, or txt
You are on page 1of 3

Overview of Data Engineering

Data

· Structured - e.g. databases

· Semi Structured - e.g. emails, platforms

· Unstructured - Photos, videos, social media contents

Source of data

Relational Database, Non-relational databases, APIs, Web Services, Data Streams, Social Platforms,
Sensor Devices

Data Repositories

· Transactional -OLTP

· Analytical - OLAP

Data formats

1. Delimited Text Files: A delimited text file is a type of file format where the data is organized in
rows and columns, with each column separated by a delimiter, such as a comma, tab, or
semicolon. Delimited text files are commonly used to store and exchange data between different
software applications.

2. Spreadsheet, XLSX: A spreadsheet is a type of electronic document that is used to organize and
manipulate data in a tabular format. XLSX is a file format used by Microsoft Excel, which is a
popular spreadsheet application. The XLSX format allows for the storage of a large amount of
data, along with formatting and formula information.

3. Extensible Markup Language (XML): XML is a markup language used for storing and
exchanging data in a structured format. It allows for the definition of custom tags and attributes,
making it highly flexible and adaptable. XML is commonly used for data exchange between
different software applications and systems.

4. PDF: PDF stands for Portable Document Format. It is a file format developed by Adobe that is
used for presenting and exchanging documents independent of software, hardware, and operating
systems. PDF files can contain text, images, and interactive elements, and they can be easily
viewed and printed.

5. JavaScript Object Notation (JSON): JSON is a lightweight data interchange format that is easy
for humans to read and write, and for machines to parse and generate. It is based on a subset of
the JavaScript programming language, and is commonly used for exchanging data between web
applications and APIs. JSON data is organized in a key-value format, similar to a dictionary.

Data Sources
6. XML: A markup language used for storing and exchanging data in a structured format.

7. APIs and Web Services: Protocols, routines, and tools used for building software applications that
enable communication and data exchange between different systems.

8. Web Scraping: The process of extracting data from websites using automated tools.

9. Data Streams and Feeds: Real-time or near real-time data sources that provide a continuous flow
of data.

Languages of Data Professional

10. Query Languages: A type of programming language used for retrieving and manipulating data
from databases. Examples include SQL (Structured Query Language) and LINQ (Language
Integrated Query).

11. Programming Languages: Languages used for creating software applications and computer
programs. Examples include Java, Python, C++, and JavaScript.

12. Shell Scripting: A type of programming language used for automating tasks on a command-line
interface or shell. Shell scripts are used for tasks such as file management, system administration,
and software deployment. Examples include Bash, PowerShell, and Zsh.

Metadata and Metadata Management

Metadata is data that describes other data. It provides information about the characteristics of data such as
its structure, format, location, and content. Examples of metadata include file names, file sizes, creation
dates, author names, and keywords.

For instance, consider a digital photograph. The metadata of the photo might include information about
the camera model, exposure settings, location, date and time taken, and file format. This information can
be used to identify the content of the photo, the camera used to capture it, and other relevant details.
Metadata is commonly used to organize, search, and retrieve data.

Technical Metadata - Technical metadata is metadata which defines the data structures in data
repositories or platforms, primarily from a technical perspective.

Process Metadata - describes the processes that operate behind business systems such as data

warehouses, accounting systems, or customer relationship management tools.

Business Metadata - Business metadata

Users who want to explore and analyze data within and outside the enterprise are typically

interested in data discovery. They need to be able to find data which is meaningful and valuable to

them and know where that data can be accessed from. These business-minded users are thus

interested in business metadata, which is information about the data described in readily
interpretable ways, such as:

 how the data is acquired

 what the data is measuring or describing

 the connection between the data and other data sources

You might also like