Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 1

eBay

The key to most company's digital transformation efforts is to harness insights from various types of
data at the right time. eBay is managing approximately 1 billion live listings and 164 million active buyers
daily. Of these, eBay receives 10 million new listings via mobile every week. "Data is eBay's most
important asset," says Seshu Adunuthula, Senior Director of Analytics Infrastructure at eBay.

A big technical challenge for eBay and every data-intensive business is to deploy a system that can
rapidly analyze and act on data as it arrives into the organization's systems (called streaming
data). There are many rapidly evolving methods to support streaming data analysis. eBay is currently
working with several tools including Apache Spark, Storm, Kafka, and Hortonworks HDF. The data
services layer of its strategy provides functions that enable a company to access and query data. It
allows the company's data analysts to search information tags that have been associated with the data
(called metadata) and makes it consumable to as many people as possible with the right level of security
and permissions (called data governance). It's also using an interactive query engine on Hadoop called
Presto. The company has been at the forefront of using big data solutions and actively contributes its
knowledge back to the open source community.

eBay's current big data approach is only one of many possible combinations and solutions accessible to
businesses looking to analyze enormous amounts of data. Of course, the choice of big data solutions is
influenced by the goals that you want to achieve as a company. There are a variety of data kinds that
may need to be evaluated in real time or saved for later study.

eBay is using big data and machine learning to address use cases such as merchandising and A/B testing
for new features. The company models personalization on five quarters of structured (e.g. one billion
listings, purchases, etc.) and unstructured (behavioral activity synopsis, word clouds, badges etc.) data.
eBay's also creating predictive machine learning models for fraud detection, account take-over, and
enabling buyer/seller risk prediction.

SQL-on-Hadoop Engine

eBay's new SQL-on-Hadoop solution offers high availability, security and reliability. A transparent data
cache layer with well-defined cache life cycle management was introduced. Most of eBay's data tables
have a bucket layout and are more suitable for "sort-merge joins". This enabled automatic caching of
the most-accessed datasets in the database. "MergeSort" or "Re-bucketing" allows data pruning on
columns not involved in buckets or partitions for faster scanning.

Bloom filter indexes are independent from the data files so they can be applied and removed as needed.
Traditional commercial databases with ACID properties provide CRUD (Create, Read, Update, Delete)
operations. Not providing Update and Delete operations would have required thousands of analysts and
engineers to learn heavy Hadoop ETL technology.

As part of the migration from a vendor to an on-premise data warehouse, eBay created a new platform
with built-in SQL authoring capability. The tool leverages Apache Livy for connectivity to the underlying
Hadoop data platform and two-way transfer of data. It also provides a centralized toolkit to support the
development life cycle for engineers. The platform has become a powerful solution providing the
following capabilities: Data exploration, interactive query analytics, metadata management, advanced
analytics and more.

You might also like