Ingest Salesforce Data Into Amazon S3 Data Lake

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

07/06/2024, 22:49 Ingest Salesforce Data Into Amazon S3 Data Lake | by Dash Desai | Medium

Open in app Sign up Sign in

Search

Ingest Salesforce Data Into Amazon S3 Data


Lake
Dash Desai · Follow
3 min read · Nov 5, 2020

Listen Share

In this blog, you will learn how to ingest Salesforce data using Bulk API (optimized
to process large sets of data) and store it in Amazon Simple Storage Service (Amazon
S3) Data Lake using StreamSets Data Collector, a fast data ingestion engine . The
primary AWS service used in our data pipeline is Amazon S3, which provides cost
effective storage and archival to underpin the data lake.

Consider the use case where a data engineer is tasked with archiving all Salesforce
contacts along with some of their account information in Amazon S3. To
demonstrate an approach of connecting Salesforce and AWS, I have created a data
pipeline that is specifically designed to facilitate seamless, secure, and real-time
flow of data between Salesforce and Amazon S3.

Pipeline Overview And Implementation


Let’s deep dive into our data pipeline implementation.

Salesforce origin

https://medium.com/@iamontheinet/ingest-salesforce-data-into-amazon-s3-data-lake-27dc16563180 1/13
07/06/2024, 22:49 Ingest Salesforce Data Into Amazon S3 Data Lake | by Dash Desai | Medium

You can configure the Salesforce origin to read existing data using the Bulk or
SOAP API and provide the SOQL query, offset field, and optional initial offset to
use. When using the Bulk API, you can enable PK Chunking to efficiently
process very large volumes of data.

The Salesforce origin is also capable of performing a full or incremental read at


specified intervals.

The origin can also be configured to subscribe to notifications to process


PushTopic, platform, or change data capture change events.

In our case, the origin is configured to ingest existing contacts information


using Salesforce Object Query Language (SOQL) in Bulk API mode.

SOQL used to retrieve contacts — “Select


Id,AccountId,FirstName,LastName,LeadSource,Email FROM CONTACT WHERE Id >
‘${OFFSET}’ Order By Id”

For details on additional configuration, refer to the documentation.

Salesforce Lookup processor

This processor is configured to perform a lookup against Salesforce to retrieve


additional information and enrich data before storing it in Amazon S3.

https://medium.com/@iamontheinet/ingest-salesforce-data-into-amazon-s3-data-lake-27dc16563180 2/13
07/06/2024, 22:49 Ingest Salesforce Data Into Amazon S3 Data Lake | by Dash Desai | Medium

In particular, based on AccountId associated with the contact, it’s retrieving


AnnualRevenue, AccountSource, and Rating for that account.

For details on additional configuration, refer to the documentation.

Field Masker processor

This processor is configured to mask PII (contact’s email address) before storing
the data in Amazon S3.

For details on additional configuration, refer to the documentation.

Schema Generator processor

This processor is configured to automatically generate Avro schema based on


the structure of contacts records.

This enables writing data in a compressed (Avro) format for cost effective
storage in Amazon S3.

For details on additional configuration, refer to the documentation.

https://medium.com/@iamontheinet/ingest-salesforce-data-into-amazon-s3-data-lake-27dc16563180 3/13
07/06/2024, 22:49 Ingest Salesforce Data Into Amazon S3 Data Lake | by Dash Desai | Medium

The Amazon S3 is configured to to store the contacts data in compressed, Avro


format.

It is also configured to use AWS Server-Side encryption (SSE) to protect and


secure contacts data written to Amazon S3.

For details on additional configuration, refer to the documentation.

Pipeline Run

https://medium.com/@iamontheinet/ingest-salesforce-data-into-amazon-s3-data-lake-27dc16563180 4/13
07/06/2024, 22:49 Ingest Salesforce Data Into Amazon S3 Data Lake | by Dash Desai | Medium

After the pipeline runs successfully, you should see the output similar to the one
shown below. Notice the highlighted AWS encryption and data format of the object
stored on Amazon S3.

And the contents of the S3 object stored in Avro format should look something like
this.

In this post, you learned the value companies can realize by leveraging and
integrating data between AWS and Salesforce using StreamSets Data Collector.
Closer integration between AWS and Salesforce opens up plenty of opportunities for
enterprises to develop new and unique ways of accessing, analyzing, and storing
their data.

Originally published at https://streamsets.com on November 5, 2020.

Amazon Data Lake Salesforce

https://medium.com/@iamontheinet/ingest-salesforce-data-into-amazon-s3-data-lake-27dc16563180 5/13
07/06/2024, 22:49 Ingest Salesforce Data Into Amazon S3 Data Lake | by Dash Desai | Medium

Follow

Written by Dash Desai


513 Followers

Lead Developer Advocate @ Snowflake | AWS Machine Learning Specialty | #DataScience | #ML |
#CloudComputing | #Photog

More from Dash Desai

Dash Desai in Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Run 3 useful LLM inference jobs in minutes with Snowflake Cortex


Overview

5 min read · Feb 28, 2024

139 3

https://medium.com/@iamontheinet/ingest-salesforce-data-into-amazon-s3-data-lake-27dc16563180 6/13
07/06/2024, 22:49 Ingest Salesforce Data Into Amazon S3 Data Lake | by Dash Desai | Medium

Dash Desai in Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Getting Started with Snowpark for Python and Streamlit


UPDATE: As of September 18, 2023 Streamlit in Snowflake is in Public Preview. This means you
can now build this application entirely in…

5 min read · Mar 9, 2022

151 2

Dash Desai in Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Deploying Custom Python Packages from GitHub to Snowflake


https://medium.com/@iamontheinet/ingest-salesforce-data-into-amazon-s3-data-lake-27dc16563180 7/13
07/06/2024, 22:49 Ingest Salesforce Data Into Amazon S3 Data Lake | by Dash Desai | Medium

UPDATE: As of Nov 7, 2022. Snowpark for Python is GA.

4 min read · Jul 25, 2022

40 1

Dash Desai in Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Integrating Generative AI with Snowflake’s External Functions


UPDATE as of JULY 2023: I’ve created a new step-by-step guide with updated code that uses
latest gpt4 model along with the new Chat…

7 min read · May 16, 2023

65 1

See all from Dash Desai

Recommended from Medium

https://medium.com/@iamontheinet/ingest-salesforce-data-into-amazon-s3-data-lake-27dc16563180 8/13
07/06/2024, 22:49 Ingest Salesforce Data Into Amazon S3 Data Lake | by Dash Desai | Medium

(ELTORO.it) Andres Perez

Salesforce’s Data Cloud Segmentation


I have been teaching several Data Cloud workshops, and there are many questions concerning
Data Cloud segmentation. It’s a complex topic.

· 15 min read · Dec 30, 2023

15 1

Nobuyuki Watanabe @marketingcloudtips

https://medium.com/@iamontheinet/ingest-salesforce-data-into-amazon-s3-data-lake-27dc16563180 9/13
07/06/2024, 22:49 Ingest Salesforce Data Into Amazon S3 Data Lake | by Dash Desai | Medium

SFMC Tips #42 : Summer ’24 Release Highlights: Notable New Features
in Marketing Cloud
The release notes for Salesforce Marketing Cloud Summer ’24, focusing on new features, have
been published. I would like to write an…

10 min read · May 30, 2024

14

Lists

Interesting Design Topics


257 stories · 579 saves

Staff Picks
656 stories · 1020 saves

https://medium.com/@iamontheinet/ingest-salesforce-data-into-amazon-s3-data-lake-27dc16563180 10/13
07/06/2024, 22:49 Ingest Salesforce Data Into Amazon S3 Data Lake | by Dash Desai | Medium

Twistellar — Salesforce Solutions

How to Prepare for Salesforce Data Migration? Best Practices


Discover comprehensive Salesforce data migration best practices in this hands-on guide by a
certified Salesforce developer.

3 min read · Dec 14, 2023

Hugo Lemos

CRM Analytics — Synchronisation between Snowflake and Salesforce


https://medium.com/@iamontheinet/ingest-salesforce-data-into-amazon-s3-data-lake-27dc16563180 11/13
07/06/2024, 22:49 Ingest Salesforce Data Into Amazon S3 Data Lake | by Dash Desai | Medium

Often, there is a business case for copying data between Salesforce and Snowflake. A less-
effort solution to enable this integration is to…

7 min read · Apr 3, 2024

Manojkumar Vadivel

2024 New Google Professional Data Engineer Certification Exam Guide


I have successfully re-certified Google’s new version of Professional Data Engineer exam, I
realized that there are no online courses out…

3 min read · Jan 5, 2024

238 8

https://medium.com/@iamontheinet/ingest-salesforce-data-into-amazon-s3-data-lake-27dc16563180 12/13
07/06/2024, 22:49 Ingest Salesforce Data Into Amazon S3 Data Lake | by Dash Desai | Medium

Ross Belmont in Salesforce Architects

Bring Snowflake Data In Seamlessly with Salesforce Connect


Understand the pros and cons of using Salesforce Connect to integrate Snowflake data into
your CRM apps.

5 min read · Apr 5, 2024

37

See more recommendations

https://medium.com/@iamontheinet/ingest-salesforce-data-into-amazon-s3-data-lake-27dc16563180 13/13

You might also like