Integrating Data Lakes With Salesforce - Lake Hydration and Visualization With Tableau

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects

rchitects | Salesforce Architects | …

Integrating Data Lakes With Salesforce: Lake


Hydration and Visualization with Tableau
Salesforce Architects · Follow
Published in Salesforce Architects
12 min read · May 27, 2021

Listen Share

Introduction
Data is among the most valuable assets businesses have, but gaining insights from
that data can be difficult. Part one of this two-part series covered setting up an AWS
account and establishing an Amazon Simple Storage Service (S3) bucket for use to
store data from Nonprofit Cloud in a sample data lake integration use case.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 1/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

This post covers how to sync data from Salesforce to S3 and visualizing it using
Tableau.

Step 3: Create a Salesforce Connection in Amazon AppFlow and hydrate


the data lake in S3 with Salesforce data
You can use Amazon AppFlow to pull data from Salesforce into S3. AppFlow is a free
and fully declarative tool that supports bidirectional secure data flows, asynchronous
and near real-time event-based patterns, and filtering and transformation of source
data. This example focuses on a single direction outbound sync (Salesforce to S3)
based on a manual trigger.

3.1. Set up Amazon AppFlow


From the AWS Management Console, search for and select the AppFlow Service.

On the AppFlow dashboard click Create Flow and then give your flow a name.
Consider choosing the name of the Salesforce object your are syncing (for example,
Contacts or Accounts). Each Salesforce object you sync will have a separate flow. The
flow name will also be the name of the folder AppFlow creates in your S3 Bucket.
Optionally, give the flow a description so it is easy to recognize from a long list of
flows. Leave all other settings at their defaults. When finished, click “Next”.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 2/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

As you configure the flow, select Salesforce as the data source, select Amazon S3 as
the destination, enter the bucket you created in the previous step, and set the data
format preference as Parquet (under additional settings). Leave all other settings at
their defaults.

Click Choose Salesforce Connection to create a new connection and then select the
Salesforce environment type. It is best practice to test all integrations in a Sandbox
environment before connecting to Production. Give the connection an easily
identifiable name as you will be able to reuse this connection for additional flows on
other Salesforce objects. For the Flow Trigger, you can select run on demand, or on a
schedule. When finished, click Continue.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 3/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Use the credentials for the dedicated API user you created in Step 1.2. Create a
Salesforce Dedicated API User. If you did not create an API user, then login using a
user with the necessary permissions as described in part one.

Click Allow to give AppFlow access to Salesforce data. This will create a Connected
App and policies in Salesforce to enable data to flow between the AWS and
Salesforce clouds. You can review and adjust the Connected App settings within
Salesforce.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 4/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Open in app Sign up Sign in

Search

Now that you have successfully connected, you can finish configuring your flow.
Remember that you will create a flow for each Salesforce object you want to sync to
S3. This example creates a flow for the Contact object.

Scroll down the configuration page and select a Flow Trigger. When starting out it is
best to run your flows on demand. Once you have tested the flow and are
comfortable with the results, you can consider setting up a schedule. When finished,
click “Next”.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 5/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Now you can use the drop down to search for and select fields from the Salesforce
object. You can also select all the fields from the Salesforce object.

Click the “Map Fields Directly” button to copy over the field labels and API name
automatically. Optionally, you can transform the source data before writing to S3 with
formulas and other data modification operations. You can also choose to use
validations to specify an action when unexpected data and formats are found. When
finished, click “Next”.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 6/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

You can choose to filter your data. For example, you might create a flow in which only
contacts with a mailing address in California are selected. When finished, click “Next”.

On the final screen, review your selections and go back to make any final updates.
When finished, click Create Flow. After the flow is created, you will be redirected to
the flow’s landing page and a success banner will appear at the top.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 7/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

To run this flow on demand, click Run Flow. A status bar will appear at the top alerting
you to progress. Successful results will appear at the top of the page in green.

Congratulations, you have now set up an Amazon S3 Data Lake, connected it to your
Salesforce Nonprofit Cloud, and hydrated the data lake with data from the Salesforce
Contact object. You can repeat the instructions in Step 3 to create flows and
schedules for additional Salesforce objects. You can also create additional buckets
and connect your lake and other sources using AppFlow (for supported applications)
or another extract-transform-load (ETL) tool like MuleSoft or Talend.

Step 4: Visualize the Salesforce data in S3 using Tableau


One of the benefits of maintaining a data lake is the ability to eliminate data silos and
quickly produce data visualizations and dashboards by combining data sources from
across your enterprise. Data in your S3 data lake can be consumed by Tableau using
Amazon Athena as the connector. The first time you move data from S3 into Tableau
will require about an hour of setup and configuration. Future data transfer will not
require these setup steps.

4.1. Create a database in AWS Glue


From the AWS Management Console search for and select the Glue service.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 8/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

On the AWS Glue Console, on the left side menu under Data catalog > Databases,
click Add Database. Type the database name and click Create to create a database in
Glue.

4.2. Set up the AWS Glue Data Crawler


On the AWS Glue Console, click Crawlers on the left and then click Add Crawler. The
Glue Data Crawler will scan through the Salesforce Contact object data in S3 and
automatically create a database schema in the Glue data catalog.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 9/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Enter a meaningful name for the data crawler and click Next.

For the Crawler Source Type select Data Stores, and for the Repeat option select
Crawl All Folders.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 10/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Click the folder icon to the right of Include Path and select the S3 bucket path that
contains the Parquet file containing the Salesforce Contact objects data.

When asked if you want to add another data store, select No and click Next.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 11/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

On the Choose An IAM Role screen, select Create An IAM Role and enter a name for
the role. AWS Glue will automatically create an IAM Role that will have the requisite
permissions to access Salesforce data in the S3 bucket you specified in the previous
step.

Set the schedule to run the crawler on demand. If you intend to make frequent
updates to your Salesforce Schema that will impact the data being synced to S3, then
you may consider running the crawler on a more regular schedule.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 12/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Choose the database created in the earlier steps to configure the crawler’s output.
This step will ensure the crawler creates a data catalog table in the database you
created earlier.

On the final screen, validate all the information entered and click Finish to create the
crawler.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 13/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

After the crawler is created you should see it on the screen. Select your new crawler
and click Run Crawler.

Upon a successful run, a status message will appear at the top of the page in a green
box.

Now navigate to the Athena service from the AWS Management Console. Select
AWSDataCatalog as the Data Source. Select Salesforce-Schema (the name you used
in the Glue step) as the database. Click the three blue vertical dots (located in the
top-right of the Tables section), and pick Preview table. You can now confirm that
your S3 data source is connected to Athena by reviewing the tables and fields in the
left column.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 14/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

4.3. Create an IAM User to set up the connection between Athena and
Tableau
From the AWS Console, search for the IAM service and click IAM.

From the IAM dashboard, click Users on the left window and then click Add User.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 15/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Enter a user name and select Programmatic Access. Click on Next: Permissions.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 16/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Click Attach Existing Policies Directly. Then use the Filter Policies search bar to find
and attach the following policies:

AmazonAthenaFullAccess

AmazonS3FullAccess

Click Next: Tags and then Create User

After creating the user, you should see Success message. Download the CSV file that
contains the access key and secret Kkey information and store it in a safe location.
You will need this information to set up the connection in Tableau.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 17/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

4.4. Install a JDBC Driver to connect Amazon Athena to Tableau


Because Amazon Athena connects to Tableau via a JDBC driver, you will need to:

1. Install the latest version of 64-bit Java

2. Get the driver (download here)

3. Place the driver:

4. Windows: Save the Amazon Athena JDBC jar in the C:\Program


Files\Tableau\Drivers folder.

5. macOS: Save the Amazon Athena JDBC jar in the ~/Library/Tableau/Drivers


folder.

6. Restart Tableau

Once these steps are completed, you can add a new Amazon Athena connection and
begin configuring it.

4.5. Visualize data with Tableau


To complete the final set of steps you will gather some information from AWS. As you
do, collect the information in a scratch document as you will need to reference it
shortly.

In step 4.3. Create an IAM User to set up the connection between Athena and
Tableau, you downloaded the IAM user’s security credentials in a CSV file. The file
contains the access key and secret key needed to set up the connection between
Athena and Tableau. Open the CSV file in a text editor and keep the keys handy.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 18/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Next, identify the AWS region in which you are running Athena. You can do this by
clicking the region dropdown menu in the top right. Use the following syntax,
replacing “us-east-1” with the region your organization is running: Athena.us-east-
1.amazonaws.com Copy this text string to your scratch document for use later on in
the setup process.

Lastly, find the Athena Query Results location via Settings in the Athena console. If
this value is blank, you will need to select a bucket path using the folder icon. You can
choose the same bucket you created in Step 2.1. Setup Amazon S3.

With these values ready you are now prepared to open Tableau Desktop and make
the connection to Athena.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 19/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Use the values you just collected to fill in the login screen. Note that you will use the
Athena Query Results path for the S3 Staging Directory.

Your connection is now complete and you can see available tables for use in the S3
bucket(s) defined.

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 20/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Drag a table to the right to begin visualizing it.

Conclusion
This two-part series walked through setting up a data lake in S3 , hydrating the data
lake with Salesforce data using AppFlow, and then using Glue and Athena to connect
that data to Tableau so you can create visualizations and dashboards.

The flow showcased in this series is the foundation for a data lake. From this starting
point, it’s possible to read Salesforce data as it was exactly at any given day based on

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 21/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

a schedule aligned with you organization’s needs. Over time your organization can
use the concepts covered in this tutorial to ingest data from additional sources
including spreadsheets, on-premises servers, and SaaS applications to build out a
comprehensive data management strategy. Before getting started, seek out the help
of your database administrator as many of these steps will require an administrator
profile and access.

With the power of Salesforce and AWS, any nonprofit organization, small business, or
enterprise can start their data lake journey and harness the power of data across
their organization with minimal technical resources.

Additional Resources
Trailhead: Amazon AppFlow

Exposing Tableau Dashboards in Salesforce

Building Secure and Private Data Flows Between AWS and Salesforce Using
Amazon AppFlow

Building AWS Data Lake visualizations with Amazon Athena and Tableau

About the Authors

Tim Weeks is an Architect on the Nonprofit Cloud Industry Solutions team at


Salesforce.org. As a certified Salesforce Architect, he leverages his deep nonprofit
industry experience to advise on matters of application architecture, data
management strategies, analytics, and more and as it relates to best practice
adoption of Salesforce technology in the nonprofit sector. Tim is an active member in

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 22/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

his local community with a passion for animal welfare, the arts, and workforce
development. You can connect with Tim on LinkedIn.

Akshay Saxena is a Senior Solutions Architect with Amazon Web Services (AWS)
supporting nonprofit organizations. He enjoys helping customers solve their
technology problems by leveraging the power of AWS. His areas of interest are data
lakes, media & entertainment, and cloud-based contact center solutions.

Alex Dinnouti is a Technical Program Manager on the Amazon Web Services (AWS)
for nonprofits team. Before joining AWS, he was the head of information technology
solutions at Conservation International. He has a master’s degree in software
engineering and his retirement dream is working with nonprofit mission impact
open-source projects.

Salesforce Architect AWS Data Lake Tableau Nonprofit

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 23/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Follow

Written by Salesforce Architects


3.9K Followers · Editor for Salesforce Architects

We exist to empower, inspire and connect the best folks around: Salesforce Architects.

More from Salesforce Architects and Salesforce Architects

Salesforce Architects in Salesforce Architects

Security Best Practices for API Access and Internal System Users
Learn best practices to secure Salesforce API access and internal system users.

10 min read · Feb 10, 2022

92 3

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 24/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Rajkumar Irudayaraj in Salesforce Architects

Introducing Salesforce Data Cloud Zero Copy for Snowflake


The limitations of traditional ETL processes pose significant challenges for businesses such as
high costs of maintenance, stale data, and…

9 min read · Apr 25, 2024

9 1

Evan Koch in Salesforce Architects

Migrating Salesforce Files to S3 with AWS Lambda


https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 25/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

One of our clients recently shared their concerns about their Salesforce file storage usage and
wanted us to design and implement a…

3 min read · Apr 24, 2024

18 1

Salesforce Architects in Salesforce Architects

5 Anti-Patterns In Package Dependency Design and How to Avoid Them


A package is a group of metadata, isolated and organized to be easier to upgrade and
maintain. In the org-based development model, the…

11 min read · Mar 18, 2021

403 3

See all from Salesforce Architects

See all from Salesforce Architects

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 26/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Recommended from Medium

Ross Belmont in Salesforce Architects

Bring Snowflake Data In Seamlessly with Salesforce Connect


Understand the pros and cons of using Salesforce Connect to integrate Snowflake data into
your CRM apps.

5 min read · Apr 5, 2024

37

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 27/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

(ELTORO.it) Andres Perez

Why Is Data Cloud The Future Of Salesforce?


I want to discuss the reasons I think Data Cloud is Salesforce's future.

· 8 min read · Feb 17, 2024

38 5

Lists

ChatGPT prompts
47 stories · 1645 saves

MODERN MARKETING
156 stories · 665 saves

Natural Language Processing


1495 stories · 1014 saves

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 28/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Mehmet Gökmen "Sky" Orun

Salesforce Customer 360 and Data Cloud — A Historical Perspective


I’ve had an incredible 18-plus years in the Salesforce ecosystem. I have been the Customer,
Consultant, Salesforce Product Manager…

7 min read · Feb 16, 2024

SF SENSEI

SAML Single Sign-On (SSO) Settings vs Authentication Providers.


https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 29/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Salesforce offers multiple methods for handling authentication and single sign-on (SSO) to
enhance security and user experience. Two…

4 min read · May 29, 2024

Sebastiano Schwarz in Capgemini Salesforce Architects

How to build a Content Document Datatable with Apex & LWC


A Technical Deep Dive into the Salesforce Files Data Model and its Content Object
Relationships

· 11 min read · Jan 29, 2024

114 1

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 30/31
07/06/2024, 22:49 Integrating Data Lakes With Salesforce: Lake Hydration and Visualization with Tableau | by Salesforce Architects | Salesforce Architects | …

Nobuyuki Watanabe @marketingcloudtips

SFMC Tips #20 : Extracting Subscribers Who Haven’t Opened Emails in the
Last 180 Days Using…
In your organization, is there a possibility that subscribers who have not opened any emails in
the past 180 days might still open emails…

11 min read · Jan 31, 2024

55 1

See more recommendations

https://medium.com/salesforce-architects/integrating-data-lakes-with-salesforce-lake-hydration-and-visualization-with-tableau-c359842fd27a 31/31

You might also like