Professional Documents
Culture Documents
Big Data Finance T7 2 CHOI NEOMA Ch6 Part Two 2024
Big Data Finance T7 2 CHOI NEOMA Ch6 Part Two 2024
Topic 7B
Hyung-Eun Choi
1 / 11
6.5 Big Data Curation State of the Art
2 / 11
6.5 Big Data Curation State of the Art
Master Data Management (MDM) is:
MDM tools can be used to remove duplicates and standardize
data syntax, as an authoritative source of master data.
MDM focuses on ensuring that an organization does not use
multiple and inconsistent versions of the same master data in
different parts of its systems.
Processes in MDM include source identification, data
transformation, normalization, rule administration, error
detection and correction, data consolidation, data storage,
classification, taxonomy services, schema mapping, and
semantic enrichment.
The three main objectives of MDM:
Synchronizing master data across multiple instances of an
enterprise application
Coordinating master data management during an application
migration
Compliance and performance management reporting across
multiple analytic systems
3 / 11
6.5.1 Data Curation Platforms
4 / 11
6.5 Big Data Curation State of the Art
5 / 11
6.5.1 Data Curation Platforms
An Example of Sheer Curation: Feedly
Leo is a news-feed AI research assistant by Feedly.
Feedly has been teaching Leo how to read and analyze
information from all the media sources and SNS platforms.
Leo allows you to prioritize topics, trends, and keywords of
choice; deduplicate repetitive news; mute irrelevant information;
summarize articles, etc.
6 / 11
6.5 Big Data Curation State of the Art
Crowdsourcing
The notion of “wisdom of crowds” advocates that potentially
large groups of non-experts can solve complex problems usually
considered to be solvable only by experts.
Crowdsourcing has emerged as a powerful paradigm for
outsourcing work at scale with the help of online people.
The underlying assumption is that large-scale and cheap labour
can be acquired on the web.
The effectiveness of crowdsourcing has been demonstrated
through like Wikipedia, Amazon Mechanical Turk, and Kaggle.
The high-level objectives of crowdsourcing:
Wikipedia follows a volunteer crowdsourcing approach where the
general public is asked to contribute to the encyclopaedia
creation project for the benefit of everyone.
Amazon Mechanical Turk provides a labour market for
crowdsourcing tasks against money.
Kaggle enables organization to publish problems to be solved
through a competition between participants against a predefined
reward.
7 / 11
6.5.1 Data Curation Platforms
An Example of Crowdsourcing: Kaggle
Kaggle, a subsidiary of Google LLC, is an online community of
data scientists and machine learning practitioners.
Kaggle allows users to find and publish data sets, explore and
build models in a web-based data-science environment, work
with other data scientists and machine learning engineers, and
enter competitions to solve data science challenges.*
*source: Wikepedia
8 / 11
Discussion
9 / 11
Group Assignment Guidelines
10 / 11
Group Assignment Guidelines
General Guidelines: Submit “two” separate materials