What Is Data Ingestion? Big Data Architecture - Where Does Data Ingestion Fit ?

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

What is Data Ingestion?

Big Data Architecture – where does data ingestion fit ?


Data Ingestion:
https://www.xenonstack.com/blog/ingestion-processing-big-data-iot-stream/

https://www.researchgate.net/figure/Steps-of-Data-Ingestion_fig3_325885888

Data ingestion challenges


Slow. Back when ETL tools were created, it was easy to write scripts or manually create
mappings to cleanse, extract, and load data. But, data has gotten to be much larger,
more complex and diverse, and the old methods of data ingestion just aren’t fast
enough to keep up with the volume and scope of modern data sources.

Complex. Because there is an explosion of new and rich data sources like smartphones,
smart meters, sensors, and other connected devices, companies sometimes find it
difficult to get the value from that data. This is, in large part, due to the complexity
of cleansing data — such as detecting and removing errors and schema mismatches in
data.

Expensive. A number of different factors combine to make data ingestion expensive.


The infrastructure needed to support the different data sources and proprietary tools
can be very expensive to maintain over time, and maintaining a staff of experts to
support the ingestion pipeline is not cheap. Not only that, but real money is lost when
business decisions can’t be made quickly.

Insecure. Security is always an issue when moving data. Data is often staged at various
steps during ingestion, which makes it difficult to meet compliance standards
throughout the process.
What are the tools available for Data Ingestion and how to
choose?

Tools:
https://www.predictiveanalyticstoday.com/data-ingestion-tools/

How to choose:
https://www.intersysconsulting.com/blog/selecting-open-source-big-data-lake-tool/

Ask the Right Questions…

Before making the move to a Hadoop data lake, it’s important to know about the tools that are available
to help with the process. But in selecting the best tool for the data ingestion process, it’s also important
to first answer a few key questions about your environments and your needs:

 What kind of data will you be dealing with (internal/external, structured/unstructured,


operational, etc.)?

 Who is going to be the key stakeholder of the data?

 What is your existing data management architecture?

 Who is going to be the steward of the data?

All of the above are questions that should be answered before beginning the data ingestion process.
Sqoop
https://medium.freecodecamp.org/an-in-depth-introduction-to-sqoop-architecture-ad4ae0532583

Flume
https://data-flair.training/blogs/flume-architecture/

https://www.simplilearn.com/apache-flume-and-hbase-tutorial

http://knowdimension.com/en/data/flume-introduction-how-it-works-sources-channels-and-sinks/

You might also like