Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Research Paper

Data Science

- Sourabh Modi

The presence of new science does not necessarily occur just like that. Every science starts from
interests, discussion, and looks for a basic foundation, but in general the main foundation of science
is mathematics. Data science includes structured and systematic knowledge about data. However,
many other sciences that has a relationship with the data in question, ranging from statistics to
computer science. This paper aims to reveal the obstacle and limitations of other science into a data
science completely, on that basis the definition of data sciences needs to be elaborated, then confirm
data science as new science and not depend directly on several other sciences.

1. Introduction
Datascience consists of two words that form a term to refer to scientific activities around or relating to what is
recognized with data [1], i.e., starting from the collection and processing, then presenting it as information that is
useful for decision making or beneficial to stakeholders concerned with data [2, 3]. As science, restrictions
about it need to be expressed, but the term reference is not enough to state the purpose and objective of its
existence as a science, which causes various definitions about it to appear [4, 5]. A science, has an ontological
basis, and taxonomically spreads in various directions of development, but still within the interrelated scope [6,
Data science involves methods in all its activities or scholarly [8], but it is logically mapped into a scientific
integration [9, 10]. In other side, data in particular is an object of study which has long been a part of other
scholarly differently [11], whereas science is born theoretically to deliver technology and other applications
that pioneered the improvement of the quality of human life [12]. Therefore, data science becomes a paradigm
system involving empirical, theoretical, computational, and big data.

2. Reviews on the track

Datascience [13]asa scientificsystemis anopen system [14],whichconsistsof interactingunits. Thus, internal units
interact with external units. Internal units focus on data, while external units complement from the outside [15].
If data science is a science about data (or knowledge related to data as a whole), then data science as a system
requires that the whole units be organized in a scientific structure and systematics [16], see Figure 1.
Data is phenomenal at the moment, which requires a special study container. Although data has long been a
majorpartof statistics, based onmathematics, , statistics do not havemore
Figure 1. Data and anything for data science.
capabilities without involving the core concept, namely probability [17, 18], even though it is already well
established with data analysis [19, 20]. Statistics both in theory and application consist of a combination of the
involvement about an amount (Σ) to an average (μ), or the centrality of the data and some of their expansion.
[21]. Statistics has not been able to predict, test and assess something, before proving the formula in fairness
according to the contour the probability [22, 23]. Many scientists claim that data science according to the first term
is none other than statistics itself [24]. However, data as a phenomenon cannot be completely revealed by statistics.
Statistics in general targets quantity or quality with parametric and non-parametric containers. However, the data
has its own systematic structure besides the distribution concepts [25, 26].
From the data side as a bequest from the data or derivation from data, or sometimes as a twin of data, or
also at a particular time a source of data, namely information is the most important part of all human
activities today [27, 28]. In this case, the method has a role, to solve problems that are linked with or from/to
the data. Computationally, statistics growin theory to present methods. However, the method’s implications
are only the implementation of statistical formulas in computation, such as computers and other tools [29, 30].
Statistics in theory has no way to use the facilities in an effective and efficient manner [31]. In fact, ease of
computing such as memory and processor speed requires a balance of performance between them.
The main requirement of a statistical contour, namely probability, is randomness [32]. To bypass testing,
minimal number of data as sample is statistical reasons as initial assumptions for data processing and
benchmarking [33], as for the dataset [34]. Instead of looking for ways to test the validity of samples from generally
accepted populations, issue a concept that statistics = data science [35]. It means that assuming data ↔
formula ↔ computation [36, 37], which have no
mathematical conclusions abstractions that have not been validly proven. It puts data mining, as a shift in
understanding of the data, unconsciously coupping the data processing rules, so it is not uncommon for the same
data to have different conclusions [38].
Is that just a recognition or as a process before the data [39]. Is it a presentation or just an external dish.
Clearly, there is a series of scientific activities before and after the data form. Background of data, for
example, who is as the collectoror aboutits origin, will cause information as a result of processed data to make
it invalid as a source of knowledge [40, 41]. Do we need to do forensics? [42, 43] The answer lies in placing the
statistical position in the scientific sequence [44]. Not a few fields of science that struggle with data, statistics
part of it. However, the basis of all that is mathematics, which was originally divided into four main areas:
arithmetic, algebra, trigonometric, and geometry. On that basis, however, statistics try to break free from the
arithmetic trap, the pitfalls of computing place statistics growing around arithmetic involving numbers and
operators [45]. Along with that, demands for the meaning of data require the presence of other fields such as
optimization, matrices, distance and similarity, operations research, and others [46]. Is not it, the fuzzy theory and
the rough set are also the results of the demands or the interests of the data [47].

3. An approach
The birth of a science begins with the emergence of the term in certain academics [48]. The existence of science
was awakened when elaborating and it gave rise to many documents that were published.
Related terms will be present to offset the terms that might be the name of the science. Data science is a term, and
definitions about it come with the documents that describe it. Currently, the description comes with the
information space as a result of discussion and exchange of opinions. There are two different spaces that
become the focus of attention. First is the information room of the search engine where there is information
that is recorded about the term, it shows the interest from various stakeholders about it [49]. The term data
science is recorded and revealed based on the year of discussion, to see the ups and downs of activities about
data science activities. Of course, semantically, this information involves all internet users who are connected and
record their interests with their responses about data science [50].
In addition, as a counterweight to that interest, information about documents related to the term science data
is indexed by search engines in the form of numbers and years [51]. On the grounds that the documents become
reliable information because they come from scientists who are related to various fields of science as stakeholders
for realizing the new science. A graphic will illustrate the trajectory of a scientific journey.

4. A discussion for establishment

To establish a new science, the related term continues to flow in every related scientific activity, see Figure 2.
Definitions fill the discussion room, and then define it, even though a definition only applies as long as there
are no objections.

4.1. Some of related terms

The term science implies organizing systematically and structurally knowledge [52]. Systematics refers to an effort
or study to build and organize knowledge in the form of explanations so as to produce a limit as the presence of
definitions in science, then followed by a theory in the form of theorems and proofs [16]. All ofthat is arranged
logically and in reasoning, and the parts become a structure of the science [53]. Based on that, data science originally
addressed computer science to underlie its science [54], but the unresolved complexity trap on algorithms [55, 56],
with which it was the focus of computer science, had caused computer science to experience scientific
defraction, and cannot be a strong foundation for a new science.
Figure 2. Hit count of term ”Data Science” in 1960-2019 based on Google.

Conversely, computer science is not to replace data science. The failure of the data science affirmation, there
is an idea to come with the term datalogy [57, 58]. Based on the nature and characteristics of the data in its
dimensions, the data becomes part of the overall existing knowledge [59, 60]. As the word logy tries to emphasize
the word method in methodology, the term datalogy thus trends to make it a part of every existing science by which
data become the support of any study in it [61]. Therefore, datalogy increasingly do not reinforce the existence of
science that is intended as data science.
Data science as a term is to express data and what is around it, starting from its existence and its meaning.
However, a systematic study of data - organization, property, and analysis, or its role in inference - in statistic
gives restraints to both if the term data science replace statistics or vice versa. The fact that today, talking
about data means that dealing with large amounts of data, or recognizing big data [62], causes some statistical
concepts to change [63]. Organizing data is not limited to numbers. Data characteristics abound as long as they
are related to the meaning of life, data attributes are not like the properties available in statistics [64]. Data
analysis by statistic is constrained by the sample, and experience obstacles when dealing with the obscurity that
big data exhibits (between as a sample or population) [65, 66]. In other words, statistics deal with convergence in
computing [67], and with that the term data science goes peacefully for describing a discipline typically involving
some mixture of statistics and large-scale computing [68]. Therefore, the term data science as a phrase in this case
is an affirmation of new tasks related to data [69].
In addition to the terms computer science and statistics, not a few other terms present as the trial name of this
science such as data mining to obscure the importance of data science [38]. In the view of data mining, drilling rigs
will mine big data such as oil mines in one pool. Although, the big data is compartmentalized in the existing
systems, but actually the big data is in an information space that does not have any such structure. Thus, mining
data in accordance with the method is only able to reveal partby part ofthe wholebig data[70, 71, 72]. The term
data science is not data mining in its overall sense, or vice versa.
4.2. Toward definition
As a new science, data science get strengthened by the dissemination that has been done by scientists or
organizations through lectures or scientific meetings [73]. Discussion about data science becomes a major issues
in the scientific world with the presence of journals that serve as a means of publishing articles related to research
or review of this science, covering the scope of studies that may be present [74]. Along with that, a new definition
of data science and an additional scope of study are presented to explain it, see Figure 3.
Once again, the basic concepts of statistics that never comeout of arithmetic traps in the interval [0, 1], from
theory to computation [75], are followed by computer science which must be pleased to be in the study and
application of ”Program = Data structure + Algorithm” until ”Genetic algorithms + Data structure =
Evolution programs” with the pitfalls its complexity.
Figure 4. Hit count of term ”Data Science” in 1960-2019 based on Google Scholar.

[76]. Statistics does not need to change into data science, while computer science must still be able to assert
itself in science. Statisticians should unravel the constraints of convergence to be able to handle data
transformation, from classical to fuzzy or rough (rough sets). The birth of another field that studies from
another angle about data and computers has sorted out several derivatives of science of fields. The focus of the
study of computer science is different from systems/information sciences, information technology or computer
systems, for example. Although, there are scientists who state that ”Data science is the child of statistics and
computer science [77]”, in theory of data science is the science of data. Differences occur only as a results of
organizing the scientific units needed in a scientific system to deal with the dimensions of the data. This system
is based on the interaction and continuity of the scientific units. Even though, all of them are based on
mathematics discrete as a driving force of scientific energy, but it will give different implications when
interpretation is based on scientific mission and vision and external targets. Thus, any science that will be born,
will rely on mathematics as a scientific foundation.
Data science consists of scientificunits that are openly organized, but havetheir own borders. Border is intended
to limit the study in accordance with the output targets and achievement targets, but also remains open to
recognizing the changes needed. There is an unequivocal goal that scientists want to produce something so
people can judge [78]. The substance of data science comes from a variety of sciences or involves
multidisciplinary investigations, and is supported by the application of technology. However, the data model is
the foundation of the investigation, but to build the model requires recognition of the data as a whole. Data
models propose a choice of method for accessing data so that data analysis has the ability to rely on producing
information. Collaborative statistics, optimization, and mining methods, including probabilistic inference, are an
attractive choice for knowledge to be present [44]. With various constraints that the method has, artificial
intelligence is present in an integrated way. Thus, data science is not only related to units of data recognition or data
models, statistics, optimization, data mining, artificial intelligence, but involves the support of technology
available in form of computing with all its devices (hardware systems, software systems, and algorithms) [79, 80],
but do not make them the substance of the study [81]. As data with all its characteristics,
Figure 4. Hit count of term ”Data Science” in 1960-2019 based on Google Scholar.

[76]. Statistics does not need to change into data science, while computer science must still be able to assert
itself in science. Statisticians should unravel the constraints of convergence to be able to handle data
transformation, from classical to fuzzy or rough (rough sets). The birth of another field that studies from
another angle about data and computers has sorted out several derivatives of science of fields. The focus of the
study of computer science is different from systems/information sciences, information technology or computer
systems, for example. Although, there are scientists who state that ”Data science is the child of statistics and
computer science [77]”, in theory of data science is the science of data. Differences occur only as a results of
organizing the scientific units needed in a scientific system to deal with the dimensions of the data. This system
is based on the interaction and continuity of the scientific units. Even though, all of them are based on
mathematics discrete as a driving force of scientific energy, but it will give different implications when
interpretation is based on scientific mission and vision and external targets. Thus, any science that will be born,
will rely on mathematics as a scientific foundation.
Data science consists of scientificunits that are openly organized, but havetheir own borders. Border is intended
to limit the study in accordance with the output targets and achievement targets, but also remains open to
recognizing the changes needed. There is an unequivocal goal that scientists want to produce something so
people can judge [78]. The substance of data science comes from a variety of sciences or involves
multidisciplinary investigations, and is supported by the application of technology. However, the data model is
the foundation of the investigation, but to build the model requires recognition of the data as a whole. Data
models propose a choice of method for accessing data so that data analysis has the ability to rely on producing
information. Collaborative statistics, optimization, and mining methods, including probabilistic inference, are an
attractive choice for knowledge to be present [44]. With various constraints that the method has, artificial
intelligence is present in an integrated way. Thus, data science is not only related to units of data recognition or data
models, statistics, optimization, data mining, artificial intelligence, but involves the support of technology
available in form of computing with all its devices (hardware systems, software systems, and algorithms) [79, 80],
but do not make them the substance of the study [81]. As data with all its characteristics,

data science is very closely related to all other important concepts about the data itself such as big data and
decision making. This assertion also gives something commitment to the data. Data recording should involve good
validation, forensics about the origin of the data becomes part of smart collection of data, because after all
principle of the use of technology still applies, namely garbage in garbage out (GIGO), whereby behavioral and
economic data have different properties than other, the data with suspect the existence of a subjective system
4.3. Track record
The track record of the development of data science as new science can be seen from the growth of information in the
information space. Information relatedto data science was revealed from the Google search engine startingfromthe
first year this term was present in the literature, Figure 2 and Figure 3, and confirmed through studies with
documented documentary evidence on Google Scholar, see Figure 4. Use of the term data science as a name this
new science has reached its culmination point, and scientists are examining its completeness in different
headlines, Figure
5. It is shown by the decreasing percentage of documents related to data science compared to information about
research group sites or other information related science data [83].
The debate about what data science is had ended, and data science is accepted as a new science that is
entirely related to data, in contrast to statistics, computer science, data mining, and so on.

4.4. Definition
Dealing with data, which as a whole as Figure 1, to express data science, it is necessary to consider a series of
relationships: data (δ), information (ι), and knowledge (κ), in a relation, as stated as follows [84]:
Data Science is ”the extraction of knowledge from high-volume data, using skills in computing
science, statistics and the specialist domain knowledge of experts.”
Data Science is”concerned with the extraction of useful knowledge from large, complex data sets.”1
However, when the extraction of something is done from its source, for example Ω represents the source, γ
represents anextraction function that involvesartificial intelligence [85,86,87,88, 89, 90, 91, 92, 93, 94, 95], the
data science is

DS(δ, ι, κ)= γ(Ω) + μ(Σ)

by which μ is a function involving tools available through other knowledge Σ.

5. Conclusion
In particular, data science has been stimulated by various relevant experts. Various terms have been raised to
provide suggestions and invitations, various responses from the public and other scientists reflected by the
presence of study documents presented in various scientific activities. As a search, data science is a new science
even though data has long been recognized in all scientific activities. Furthermore, based on the importance of
data science determination, it is necessary to study the current status of chronological data science.

Acknowledgment: ThispaperistheresultofastudyvisittoEuropebyUniversitasSumatera Utara Team of

the Erasmus+ DS&AI project.


References [1] L Manovich 2015 Data science and digital art history International Journal for Digital
Art History 1. [2] E K Nwabueze, P Ranch 2005 Methods for dynamically accessing, processing, and
presenting data acquired from disparate data sources Unites States Patent No:USOO6959306B2. [3]
P Obrador 2006 Presenting a collection of media objects Unites States Patent No:US007149755B2.
[4] M K M Nasution 2007 SumutSiana Renungan, IPR:EC00201944654.
DOI:10.13140/RG.2.2.10127.59047. [5] M K M Nasution 2018 SumutSiana IOP Conference Series:
Materials Science and Engineering 309(1). DOI:10.1088/1757-899X/309/1/012131

You might also like