Professional Documents
Culture Documents
Introduction - Big Data
Introduction - Big Data
Introduction - Big Data
According to Gartner, "By 2015, companies that have adopted big data and extreme
information management will begin to outperform unprepared competitors by 20% in
every available financial metric". Check this
Analyzing Big Data is becoming important to identify trends and lot of companies
have started doing it to get the edge over competitors, scenarios that fit for such
analysis are,
Political campaign wanting to be get a real time feedback of various actions based
on Tweets and Facebook comments
Phone companies wanting to trend their billing data
Credit card companies trying to stop fraud before sales happens
Web log analysis to identify trends
So what is Big Data?
Some people visualize Big Data as the data with the magnitude of Terabytes,
Petabytes. With various views around; it makes sense to establish a common ground
and demystify Big Data a bit.
Data that businesses have to deal with today is no more from mere business
applications as structured data but every activity in the ecosystem from partners,
competitors, vendors, suppliers, investors, regulation agencies, and customer is
generating data. In today's business environment, social channels such as Tweeter,
Facebook, and LinkedIn have become key influencers.
The advent of these un-imaginary data sources is leading to un-imaginary data
volumes. This large and un-imaginary data set is termed as "Big Data".
To define in simple terms, Big Data is typically large volume of un-structured (or
semi structured) and structured data that gets created from various organized and
unorganized applications, activities and channels such as emails, tweeter, web logs,
Facebook, etc.
Also this ultra-fast expansion and influence from un-structured data sources beyond
traditional line of businesses in the enterprise boundary is mandating more inclusive
and rapid analysis (analysis and response in near real time).
The traditional data warehouse and BI approaches were found inadequate to meet the
necessary latency of making business decisions within the budgeted costs while
dealing with such data volume.
As more and more companies are devising unique strategies for dealing with Big Data
leading to warming up of the Big Data market. Apache Hadoop Map Reduce based
open source solutions has been at the forefront of providing the solutions in the Big
Data space. Microsoft is also positioning itself strongly as a choice of platform for the
Big Data solutions. Check a MS Vs Hadoop poll on this.
In the next blog, we will see predominantly the technologies that Microsoft is
throwing at enterprises to solve the Big Data problem.
1 Comment
inShare42
scenarios for data analytics, financial analysis, machine learning on Windows Azure.
It is currently in beta and expected to be available in early 2012.
Other existing products
SQL Server Parallel DataWarehouse (PDW) is an appliance meant for large scale data
warehousing, BI needs.
SQL Server 2008, or SQL Server 2012 (beta) provides BI stack for standard BI, DW
needs
SQL Server Analysis Services is part of SQL Server, as well PDW and supports
cube for (OLAP) Online Analyticalprocessing.
SQL Server Integration Services (SSIS) is part of SQL Server, as well PDW and
provides data integration, cleansing, etc. capability
SQL Server Reporting Services (SSRS) is part of SQL Server, as well PDW and
provides data reporting, slicing, dicing, etc.
SQL Server also provides data mining algorithms.
ExcelDataScope is a reporting layer from Microsoft Research and is yet to be
released.More about it here
These are some of the key technologies already released and slated to release in
evolving MS Big Data stack.