Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-17

Applied Mathematics and Nonlinear Sciences


https://www.sciendo.com

Visualization communication mode and path optimization of data news in the context
of big data

Hezhen Zhang1,†
1. School of Journalism and Communication, Communication University of China, Nanjing, Nanjing
Jiangsu, 211100, China.

Submission Info

Communicated by Z. Sabir
Received September 3, 2022
Accepted February 15, 2023
Available online August 7, 2023

Abstract
With the development of big data technology, not only driving the development of the social economy but also the news
media industry is developing in the direction of integration and innovation, and promoting the dissemination of news
through data value factors is the focus of current research. This paper takes data news as the research object, takes the
framework theory as the entry point, and mainly studies the data news production dilemma and its optimization path.
Firstly, the data news information is classified by entity extraction, and the weights between the entity information are
calculated to establish the association. Secondly, the IE-Page Rank algorithm is proposed to get the IER value of each
information entity by iterative calculation, which is used to identify its importance and quantitatively get the importance
ranking of all information entities. Finally, the basic framework of data news visualization is constructed, and the
applicable visualization optimization dissemination path is given in the case. The research results show that compared
with the traditional media news dissemination model, the improved data visualization dissemination model increases
efficiency by 32.3%, timeliness by 18.9%, user satisfaction by 21.1%, and effectively increases the reading volume and
dissemination paths by 17.2% of users. The improved data news visualization dissemination model proposed in this paper
improves the professionalization of data analysis, enhances the interactivity and visualization of data news works, and
provides guidance for disseminating data news.

Keywords: Big data; Data journalism; Information entities; IE-Page Rank; Visualization.
AMS 2020 codes: 62-07

†Corresponding author.
Email address: zhanghezhen2022@163.com ISSN 2444-8656
https://doi.org/10.2478/amns.2023.2.00140
© 2023 Hezhen Zhang, published by Sciendo.
This work is licensed under the Creative Commons Attribution alone 4.0 License.
2 Hezhen Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-17

1 Introduction

With China's economy's continuous and rapid development, the public’s attention to the economy is
increasing daily, and the value behind a large amount of economic data needs to be urgently explored.
However, there are also many problems, the professionalism of news communication and the
audience’s contact habits do not match, in which the principles are obscure and not easy to
disseminate [1-3]. This requires the news media to carry out economic news production so that the
dissemination effect is optimal, which naturally becomes a prerequisite for the success of news
reporting. The visual and intuitive narrative of data journalism naturally meets this need of the media
and the audience, and data journalism has developed by leaps and bounds [4-7].

The literature [8] argues, “In today’s information-rich world, the process of information processing
is particularly important.” Therefore, the technology vector characterized by data-driven has justified
its existence in the context of big data today. The literature [9] cites in detail a series of processes in
the data news production process, from topic planning, data collection and analysis, content
arrangement, and visual presentation, by examining reports in media such as The Guardian and The
New York Times as research subjects. Drawing on the theory of “inverted pyramid”, the paper [10]
further proposed a double pyramid structure of data journalism, pointing out that the editing, cleaning,
contextualization, and synthesis of data journalism can achieve six communication effects:
visualization, narrative, socialization, humanization, personalization, and application after the
process of dissemination.

The coexistence of two elements, visualization, and narrative, in the communication link, plays a
crucial role in achieving better communication results. The literature [11] investigated and found that
visualization became an effective strategy for information dissemination in the era of visual culture.
The literature [12] explored several rhetorical strategies in data visualization and proposed the
hypothesis of constructing a narrative framework for data journalism visualization in terms of visual
representation and textual annotation. The literature [13] delves into the challenges of news
visualization production through case studies and suggests that introducing information visualization
in media requires integrating journalistic and visual thinking. The literature [14] explores the
technical production process of news visualization and suggests three key factors that influence news
visualization: skills, mindset, and management style. This finding provides some ideas for this study
about the impact of data journalism on traditional journalism in terms of journalistic concepts.

In this paper, we first analyze the characteristics of data news and users’ information needs, analyze
the components of data news (subject, time, place, event, reason, and occurrence process), and
determine the elements of information entities to be extracted, the extraction position and their weight
allocation methods by combining with users’ information needs. Next, we study data news
information entity extraction and association establishment and then calculate the importance ranking
of all information entities by IE-Page Rank algorithm. Based on multiple perspectives such as
association networks, time series, and geographical location, visualization analysis is carried out
using visualization means such as network diagrams, time axis, and geographical maps. Finally, after
extracting the information entities of each data news and establishing the association relationship
between them, the data news visualization impact factor is constructed, and the dissemination
optimization path is proposed.
Visualization communication mode and path optimization of data news in the context of big data 3

2 Data News Visualization Modeling

2.1 News Information Visualization

Information visualization evolved from the development of scientific visualization, and compared
with scientific visualization, the data to be visualized is not the result of large data sets or
mathematical models but focuses more on abstract data sets, such as unstructured text sets. It aims to
study the visual representation of non-numerical large-scale information resources and helps people
analyze, mine, and understand information resources through the technical methods of graphic images.

2.1.1 News text information entity extraction

To reduce the complexity of the research, this paper focuses on the four elements of data news: subject,
time, place, and event (keywords), which constitute the basic news elements from the perspective of
news elements. To reduce the complexity, this paper focuses on the four elements of data news:
subject, time, place, and event (keywords), which constitute the basic information entities of news
elements. From the perspective of users’ information needs, this paper focuses on two elements of
traditional industries and companies involved in news events, which constitute information entities
of users’ information needs.

1) Subject class information entity;

Subject class information entities refer to the subjects involved in news events, subdivided into
organizations and natural persons, organizations such as the State Council, Ministry of Agriculture,
Ministry of Education, etc., and natural persons such as Jay Chou and Jack Ma. In this paper, the
subject class information entities are formally represented as:

Participants = {Participant1 , Participant2 ,..., Participanti ,..., Participantmp }


(1)

Where, Participants denotes the set of subject class information entities in the data news,
Participant 𝑖 denotes the 𝑖 rd subject class information entity in the set, and 𝑚𝑝 denotes the total
number of subject class information entities in the set.

2) Time-based information entities;

Time-class information entity refers to the point in time when a news event occurs, such as October
2016. People generally use the time system of century, era, year, month, week, and day to represent
time, and this system is also used for the granularity classification of time-based information entities
in this paper. In this paper, “month” is selected as the granularity criterion of time-based information
entities, and time words such as week, day, and morning are extracted and generalized to “month”-
level time concepts. The time information entities are represented formally as:

Dates = {Date1 , Date2 ,..., Datei ,..., Datemt }


(2)

Where, Dates denotes the set of time-based information entities in the news, Date 𝑖 denotes the set
of 𝑖 rd time-based information entities in the set, and 𝑚𝑡 denotes the total number of time-based
information entities in the set.

3) Location-based information entities;


4 Hezhen Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-17

In this paper, 34 provincial-level administrative regions are selected as the granularity criteria of
location-based information entities, and the location terms such as township/town, county/district,
and city/autonomous region are generalized. The formal representation of location-based information
entities is:

Locations = {Location1 , Location2 ,..., Locationi ,..., Locationml }


(3)

Where, Locations denotes the set of location-based information entities in the news, Locations 𝑖
denotes the 𝑖 rd location-based information entity in the set, and 𝑚𝑙 denotes the total number of
location-based information entities in the set.

4) Keyword information entity;

The keyword information entity is the keyword of the event reported in the news. Since the events
involved in Web news are complicated and not suitable for granularity division, this paper extracts
the keyword of the news to represent the event element in the basic elements of the news report, and
also represents the main content of the news report. The keyword information entity is represented
formally as:

Keywords = {Keyword1 , Keyword 2 ,..., Keywordi ,..., Keyword mk }


(4)

Where Keywords denotes the set of keyword information entities in the news, Keywords 𝑖 denotes
the 𝑖 rd keyword information entity in the set, and 𝑚𝑘 denotes the total number of keyword
information entities in the set.

2.1.2 Data news information entity association establishment

Information entities are extracted by word annotation and deactivation of words to obtain a
meaningful set of annotated words, and subject information entities, time information entities,
location information entities, and company information entities are extracted according to the results
of word annotation; keywords of news text are extracted to represent the subject content of news text
as key words information entities; deep learning is used to classify the news text corpus according to
industry. The process of extracting information entities is shown in Figure 1.
Visualization communication mode and path optimization of data news in the context of big data 5

News Text

New word discovery and


thesaurus expansion

Participle and word


marking

Removing deactivated
words

Extraction
Extraction Extraction Extraction Extraction
Extraction of
of subject of time- of location- of industry-
of keyword company-
class based based based
information like
information information information information
entities information
entities entities entities entities
entities

Information entity set

Figure 1. Entity information extraction flow chart

Multiple types of information entities extracted from the same news text are jointly focused on the
subject matter of the news, to express the main content and core information of the news report. These
information entities are related to each other and are given the same association weight, which is set
to 𝑤. The association and association weights between various information entities of the same news
text are shown in Figure 2.

Time-based Location-
w
entities based entities

Industry- Corporate
w w w
based entities entities

Industry- Corporate
w
based entities entities

Figure 2. Association between various information entities of the same news text
6 Hezhen Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-17

It has been set that each type of information entity has the same association weight 𝑤 between two,
but for a certain type of information entity, it may contain 0, 1, or more specific information entities,
and without considering the case of 0 information entities, there may be three association cases such
as one-to-one, one-to-many, and many-to-many between specific information entities of two types of
information entities. The association between class A information entities and class B information
entities is abstracted as a 𝑚-to-𝑛 association, and both 𝑚 and 𝑛 are positive integers, which denote
the number of concrete information entities in class A information entities and class B information
entities, respectively.

Then the association weight of 𝐼𝐸𝐴𝑖 to 𝐼𝐸𝐵𝑗 can be expressed as:

wAi 1
TAi Bj = w  m

w
n
Ak
k =1 (5)

Where 𝑤 is the same association weight value among all types of information entities, 𝑤𝐴𝑖 is the
weight of the 𝑖 rd class A information entity 𝐼𝐸𝐴𝑖 , and ∑𝑚
𝑘=1 𝑤𝐴𝑘 is the sum of the weights of all class
A information entities extracted from a single news text. It can be understood that the 𝑖 th class A
information entity, then the 𝑖 th class A information entity 𝐼𝐸𝐴𝑖 occupies the weight 𝑤 × 𝑤𝐴𝑖 /
∑𝑚𝑘=1 𝑤𝐴𝑘 in the 𝑚 th class A information entity. distributing it equally to the 𝑛 th class B information
entity, then the 𝑖 th class A information entity 𝐼𝐸𝐴𝑖 points to any one class B information entity with
its association weight 1/𝑛 and the many-to-many association relationship between the two classes of
information entities, as shown in Figure 3.

Class A Information Class B Information


Entity Entity
IE A1 IEB1

wAi 1
w m

w
n
Ak
k =1
IE A2 IEB2
wAi 1
w m

w
n
Ak
k =1

wAi 1
w 
IE Ai m IEBi
w
n
Ak
k =1

wAi 1
w m

w
n
Ak
k =1
IE Am IEBm

Figure 3. Many-to-many association relationship between two types of information entities


Visualization communication mode and path optimization of data news in the context of big data 7

In this paper, we adopt the form of an association matrix to represent the association relationship
between information entities. The row and column directions in the association matrix are all the
information entities in the information entity set, and the matrix elements represent the association
weights between the corresponding information entities. The correlation matrix N represents the
total number of all information entities in the information entity set, and the matrix row vector or
column vector represents the correlation between the corresponding information entities and all
information entities. This association matrix is first initialized to a zero matrix. Then the association
weights between each information entity in the news text set are calculated item by item and
accumulated to the corresponding elements of the association matrix. Finally, the association of
information entities constitutes the association matrix.

2.2 Information entity ranking algorithm

Based on the established information entity association network, this paper proposes the IE Rank
algorithm for ranking information entities based on the main idea of Google’s Page Rank algorithm
for ranking web pages and realizes the importance of ranking extracted information entities. The
quantitative importance representation of all information entities is given, and various types of key
information entities are obtained so that the importance of information entities in a large-scale Web
news text set can be understood as a whole, and the important metrics of information entities can be
visualized and presented later.

2.2.1 Page Rank algorithm

The core idea of the Page Rank algorithm is based on two main points:

1) If many other web pages link to a web page, it means that the web page is more important, i.e.,
the PR value of the web page will be relatively high;

2) If a web page with a high PR value links to a different web page, the PR value of the linked
web page will be increased accordingly.

For a web page, the other web pages with hyperlinks to the web page are called the incoming pages
of the web page, and the other pages with hyperlinks to the web page are called the outgoing pages
of the web page. Suppose a network consisting of four web pages A, B, C, and D, and pages B, C,
and D have links to page A, which is the incoming page of page A, then the PR value of page A can
be expressed as [15]:

PR( B) PR(C ) PR( D)


PR( A) = + +
Lout ( B ) Lout (C ) Lout ( D )
(6)

Where, 𝑃𝑅(𝐵) indicates the PR value of B pages, 𝑜𝑢𝑡(𝐵) indicates the set of linked pages of B pages,
𝐿out (𝐵) indicates the number of linked pages of B pages, and C and D pages are expressed in the same
way.

However, since a page will not get any votes if it is not pointed to by any page, its PR value will be
0, and the PR value it passes to other pages will also be 0. Therefore, it is necessary to set a minimum
PR value for each page. Here, we introduce a damping factor of 𝑑 to represent the probability that a
user chooses a hyperlink on the current webpage when surfing the Internet randomly, and 1 − 𝑑 to
represent the probability that a webpage is opened randomly without choosing a hyperlink on the
8 Hezhen Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-17

current webpage, and the damping factor of 𝑑 is generally taken as 0.85. Thus, the PR value of
webpage A can be optimized as follows:

1− d PR( B) PR(C ) PR( D)


PR( A) = + d ( + + + ...)
N Lout ( B ) Lout (C ) Lout ( D)
(7)

where 𝑑 is the damping factor, N denotes the total number of pages in the network, and B, C, D, ...
1−𝑑
denote the pages that link to page A. The minimum PR value for each page is set to 𝑁 , but it should
be noted that in the original version of the Page Rank algorithm, the minimum value set for each page
1−𝑑
was 1 − 𝑑, not 𝑁 .

Further, the PR value of any page 𝑉𝑖 can be calculated by expressing the formula as:

1
PR(Vi ) = (1 − d ) + d  
V j ln(Vi ) Lout (V j )
PR(V j )
(8)

Where 𝑃𝑅(𝑉𝑖 ) and 𝑃𝑅(𝑉𝑗 ) denote the PR values of 𝑉𝑖 and 𝑉𝑗 pages respectively, ln⁡(𝑉𝑖 ) denotes the
set of all incoming pages of 𝑉𝑖 pages, out⁡(𝑉𝑗 ) denotes the set of all outgoing pages of 𝑉𝑗 pages, and
𝐿out (𝑉𝑗) denotes the number of all outgoing pages of 𝑉𝑗 pages.

The Page Rank algorithm is calculated as follows:

1) Initialization: All pages in the World Wide Web are given the same initial PR value, i.e., all
pages are considered to have the same importance before starting the iterative calculation, and
the PR value of each page can be set to 1.

2) Iterative calculation: According to the calculation method of web page PR value described
above, the PR value of each web page is calculated one by one, which constitutes one iteration,
and then the PR value of all web pages is recalculated based on the new web page PR value,
and the web page PR value is continuously updated in the process of each iteration.

3) To get stable results: the PR value of web pages will tend to be stable, i.e., converge, until we
get a stable PR value of web pages, or exceed the limited number of iterations, according to
the final PR value of each web page that is to achieve the importance ranking of all web pages.

2.2.2 IE-Page Rank algorithm

In the IE-Page Rank algorithm, the constructed information entity network is mapped to the World
Wide Web, and the extracted information entities are mapped to web pages. The association
relationship between information entities is mapped as hyperlinks between web pages, and then the
IER values of all information entities can be obtained using the information entity network through
iterative calculation. In this paper, the IER value is used quantitatively to represent each information
entity's importance very. Similar to the formula for calculating the PR value of a web page, the
formula for calculating the IER value of any information entity 𝐼𝐸𝑖 is expressed as [16]:
Visualization communication mode and path optimization of data news in the context of big data 9

(1 − d ) TIE IE
IER( IEi ) = +d  N j i
 TIE j IEk
N IE j ln( IEi )

k =1 (9)

Where, 𝑑 is the damping factor and takes the value of 0.85, N is the total number of information
entities that constitute the information entity network, ln⁡(𝐼𝐸𝑖 ) denotes the set of information entities
that vote to the 𝑖 rd information entity𝐼𝐸𝑖 , 𝑇𝐼𝐸𝑗𝐼𝐸𝑖 denotes the association weights of 𝐼𝐸𝑖 and 𝐼𝐸𝑗 , and
∑𝑁𝑘=1 𝑇𝐼𝐸𝑗 𝐸𝑘 denotes the sum of all association weights noted by 𝐼𝐸𝑗 . The IE Rank algorithm flow is
shown in Figure 4.

Information
News Text Entity Information Affiliate entity
extraction entity set establishment association
set

Information entity
association network

Initialization

Update information
entity IER values

Whether stable or NO
super-limited number of
iterations is reached

YES
Information entity
IER value

Figure 4. Flow chart of IE Rank algorithm

1) Initialization

After the network of information entities is constructed, the initial IER values of information entities,
the association relationships of information entities, and the weight assignment relationships of
information entities need to be expressed mathematically in the initialization stage. The initial state
can be assigned to all information entities with any positive IER value, and a 𝑁-dimensional vector
is used to represent the initial IER values of all information entities, then the 𝑁 × 𝑁 matrix represents
the association relationship between information entities, and the association matrix is said to be:

 T11 T12 T1N 


 T2 N 
T =  T21 T22 
T TNN 
 N 1 TN 2 (10)
10 Hezhen Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-17

Where, 𝑇𝑖𝑗 denotes the association weight between the 𝑖 th information entity and the 𝑗 th information
entity, 𝑁 the total number of information entities, and the row vector or column vector of this
association matrix both denote the association weight between the corresponding information entity
and all information entities.

In this paper, we use a matrix of 𝑁 × 𝑁 to represent the weight assignment relationship between
information entities and call the voting matrix:

 V11 V12 V1N 


 V2 N 
V =  V21 V22 
V VNN 
 N 1 VN 2 (11)

Where, 𝑉𝑖𝑗 denotes the weight of association between the 𝑖 nd information entity and the 𝑗 rd
information entity, i.e., the weight (number of votes) given by the 𝑖 th information entity to the 𝑗 th
information entity, 𝑁 is the total number of information entities, and the row vector of this voting
matrix denotes the weight given by the 𝑖 th information entity to all information entities, and the
column vector denotes the weight of votes received by the 𝑗 th information entity from all information
entities.

Let matrix 𝐷 denote the diagonal array of order 𝑁 consisting of the reciprocal of the sum of the
associated weights of each information entity and all information entities, and 𝑒 be a 𝑁-vector of all
1.0, then the relationship between matrix 𝑉 and matrix 𝑇 exists as:

(1 − d ) T
V= e + dTD
N (12)

2) Update information entity IER values

The voting matrix 𝑉 = (𝑉𝑖𝑗 ) has been obtained, and the IER value of the 𝑖 nd information entity
𝑁×𝑁
is calculated as:
N
Ri = Vij R j
j =1 (13)

Let 𝑘 ∈ (0,1,2, … , 𝑛) denote the number of iterations and 𝑘 = 0 denote the initialization state, then
the IER values of all information entities are denoted as:

R ( k +1) = VR ( k ) (14)

The IER values of all information entities are updated once in each iteration.

3) Obtain the information entity IER value

As the above iterative process continues, the IER values of all information entities will tend to reach
the equilibrium and stable state until the limit is reached, i.e., the IER values of each information
entity converge to the limit value, and the limit vector R is the IER value of each information entity
at this time. Alternatively, when the number of iterations exceeds the limit (1000 in this paper), the
Visualization communication mode and path optimization of data news in the context of big data 11

iterative process can be terminated, and the vector R can be used as the final IER value of each
information entity when the change of the IER value of each iteration is very small.

3 Data news visualization and communication optimization analysis

The three dimensions of data news visualization are analyzed above, showing different characteristics.
Further investigation shows differences in the dissemination effects of different data news, and IE-
Page Rank ranks the three variables of dissemination effects, and the dissemination effects can be
divided into two categories and have good differentiation. One category is a good communication
effect with a high reading, “in-view,” and comment volume, while the other is a relatively low
communication effect. The effectiveness of communication is cross-analyzed with the variables of
news framing as an important variable. Under the perspective of frame theory, the analysis is
conducted from the communication effect in conjunction with the data news production process
theory. The three aspects of data news production, content presentation, and visualization
performance are analyzed to analyze the deficiencies and causes of data news dissemination strength.

3.1 News framing and communication effect analysis

3.1.1 Variable analysis of data news dissemination effects

To study the issue of examining the communication effect of data journalism from the frame theory
perspective. Based on analyzing the characteristics of the three levels of frames, to strengthen the
scientific nature of the conclusions, the news text information entities were extracted for the previous
article into subject categories, time categories, location categories, and keyword information entities
to reflect the level of communication effectiveness. The communication effect is an important
variable, and cross-variate analysis is done with the frame categories.

1) News production teams, data sources, and visualization diversity make a difference in
communication effectiveness

Comparing and analyzing whether there is a statistically significant correlation between the total
number of people in the data news production team and the behavior of high-activity and low-activity
users. T-test shows a significant difference in the impact of the number of people in the data news
production team on the behavior of users. Overall, the preference for the total number of news
production team members is higher in the group of high active user behavior than in the group of low
active user behavior, as shown in Table 1. The mean value of 𝑡 = 9.118, 𝑝 < 0.005 is 1.439 higher,
which indicates that a better composition of the data news team can lead to better dissemination of
data news stories. The audience is most sensitive to the quality of news, and a well-defined data
journalism team greatly affects the depth of content and the friendliness of visual presentation of data
journalism, which helps to improve the quality of data journalism and obtain better communication
effects.

Table 1. Sample t-test of user group behavior


Variable M SD t df p
Total number of the news production team 15
High user behavior 4.65 1.752 9.118 263 0.000
Low User Behavior 3.107 1.362

2) The correlation between the choice of topic, reporting tendency, and communication effect
12 Hezhen Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-17

Depending on the research question and the type of indicators of the research variables, the two main
categories of independent variables are “between-group variables” while the dependent variables are
nominal or dichotomous, and the correct statistical analysis method is to use chi-square analysis for
the study. The differences between the two subgroups of active and inactive users’ behaviors were
compared and analyzed concerning the selection of news stories, as shown in Table 2. The difference
between the two subgroups is shown in Table 2, where 𝜒 2 = 7.242 and 𝑃 = 0.016, 𝑝 < 0.05 for
“news coverage topic” indicates that there is a significant difference in whether different news
coverage topics affect the dissemination effect of data news, and different data news topics affect the
dissemination effect.

Table 2. Cartesian test of news story selection and active and inactive user behavior grouping
Variable User behavior grouping 2 P
News Story Selection Active Users Inactive users 7.242 0.016
Politics 0 3
Economy 21 180
Social 11 83
Environment 0 2
Recreation and Sports 19 112
Other 1 41

We compare and analyze whether there is a significant difference in news reporting tendency between
the two groups of active and inactive users’ behavior, as shown in Table 3. The 𝜒 2 = 9.307 and 𝑃 =
0.021, 𝑝 < 0.05 of “news reporting tendency” indicate that there is a significant difference in whether
different news reporting tendencies affect the dissemination effect of data news, and different data
news reporting tendencies affect the dissemination effect.

Table 3. News coverage propensity and active and inactive user behavior grouping chi-square test
Variable User behavior grouping 2 P
News coverage tendencies Active Users Inactive users 9.307 0.021
Front 33 39
Negative 9 24
Neutral 1 397

3.1.2 Comparative analysis of data news dissemination effects

Based on the SPSS mentioned above statistical analysis, the data journalism practices for data
journalism are summarized in three aspects: production preparation, content performance, and
visualization presentation, as shown in Table 4. The comparison clearly shows a significant difference
between active users and low active users in terms of the total number of a news production team and
the total number of data sources, shown in the mean value, where active users are higher than low
active users. This indicates that improving quality in these two areas can improve the effectiveness
of data news dissemination. There is also a correlation between the influence of news coverage
selection and news coverage tendency on the dissemination effect. Active users have a stronger
demand for news coverage, economic selection, and the neutrality and objectivity of news coverage
than low active users; in other words, more professional economic news coverage and neutral and
objective coverage can achieve a better dissemination effect.
Visualization communication mode and path optimization of data news in the context of big data 13

Table 4. Differences in framework metrics between high and low active user subgroups
Variables Average value Standard deviation
Active user Low active user Active user Low active user
Length
behavior behavior behavior behavior
Number of the news
3915.22 3181.49 730.21 1435.47
production team
Total number of data
0.910 2.715 0.592 2.417
sources
Type of News Coverage 3.947 3.041 3.059 4.164
News Story Selection 3.698 5.303 0.993 4.001
Tendency to report 4.113 4.815 1.483 2.554
Industry 0.794 2.146 2.179 0.671
Scope of Coverage 4.279 4.442 2.675 1.402
Timeliness 2.062 1.523 0.506 0.862
Visualization and diversity 2.748 4.208 4.440 1.217

3.2 The dilemma and causes of data news visualization

Through the information entity ranking algorithm, the dissemination effect is expressed in three
dimensions in-watching, reading, and commenting, and after applying SPSS software, the data news
dissemination effect can be well divided into two categories of better and worse. The two better and
worse communication effects categories are treated as variables and cross-analyzed with frame
characteristics.

3.2.1 Lack of first-hand raw data and disconnect between production concept and audience

The analysis reveals that the more data sources, the larger the scale of data used and the richer the
data diversity, the better the communication effect can be achieved. The shortcomings of the analysis
of the composition of data acquisition are shown in Figure 5.

Figure 5. Production team statistics and analysis of data news

Statistics and analysis of the production team of data journalism revealed two dilemma factors:
14 Hezhen Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-17

1) The production team lacks professional data personnel;

In the sample data, 281 news stories were found to have a dedicated staff for data visualization to
produce, with a growth rate of 28.7% for editors and supervisors, an even lower growth rate of 5.9%
for other staff, and even less occurrence of data collection and processing staff. Only a maximum of
419 data news stories had a production team involving a dedicated person for data-related work,
which led to a lack of professionals in data collection and difficulty in obtaining specialized first-
hand data, a shortcoming that exists from the analysis of the human component of data acquisition.

2) Insufficient professional data use and deep resolution;

On average, there were 362 data sources per data news article, and 19.8% of data news stories used
2 or more data sources, and data news stories used a more diverse set of data sources. However, the
data sources are mostly commercial data. News sources are first-hand journalists, the growth rate is
only 11.5%, and the lack of first-hand data, resulting in data analysis, is not professional enough, to
a certain extent, affects the efficiency of news dissemination.

3.2.2 The content of the report is utilitarian, and the ability to report unexpectedly is
insufficient

The news is a reconstruction of the content presentation, using visual reporting and establishing a
new news production process. In terms of content coverage, due to its media strategy, there is a
problem with consumerist tendencies in the selection of stories and multiple reasons for the lack of
burst coverage capacity. The information entity ranking algorithm yielded key indicators leading to
the lack of burst coverage capacity, as shown in Figure 6. In general, data news coverage focused on
“big data, cities, young people, consumption”, etc. Among the 400 study samples, the top four
keywords were 63% of recreation and sports, 49% of retail consumption, 45% of technology frontier,
and 42% of cultural and travel consumption. Most of them focus on urban consumption, young
people’s consumption behavior, and so on, through a large amount of data to analyze and interpret.
The positioning of users and their content at this level is relatively clear, allowing users to reacquaint
themselves with consumption and the city with their characteristics. Focus on the economy, social
and livelihood topics, and the overall bias of youth, in all types of the highest proportion of 44%,
followed by social and livelihood topics accounted for 21% of the selection.
Visualization communication mode and path optimization of data news in the context of big data 15

Figure 6. Proportion analysis of keywords classification in the data news industry

It is easy to find two flaws in the news reports by extrapolating the IE Rank algorithm:

1) The visualization form is relatively single;

In the visualization presentation, the data is mostly presented in static infographics and text, which
lacks interactivity and has a low degree of visualization. Text and static infographics for data
presentation account for an absolute proportion, accounting for 63%. The use of multiple presentation
methods is also relatively single, with a few infographics to display important information, and only
4.7% use more than three forms of data presentation.

2) Not taking advantage of mobile;

Most of the news works are static infographics and a few interactive works. In the planning,
production, and dissemination of data journalism works, which are highly commercial, data
journalism itself has economic and social value and does not attract more audiences to participate in
the whole process, and the advantages of mobile terminals are naturally not reflected.

4 Conclusion

In this paper, based on the analysis of the dilemma faced by data news production and its causes, we
consider feasible optimization paths from the depth of the process of data news production. From
production elements and content presentation to realizing visualization strategies, news data
information entities are extracted in the production preparation process, and their database is
established by taking advantage of crowdsourcing. In the content presentation, establish data news
information entity association while improving the specialization of data analysis. In the visualization
presentation process, the IE-Page Rank algorithm is used to complete the data visualization. The
following optimization path is derived from the subsequent analysis of the existing data news:
16 Hezhen Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-17

1) The dilemma of data journalism lacking primary data and the disconnect between production
concept and audience;

The study in this paper shows that the proportion of news professionals is only 28.7%, making it
difficult to obtain specialized first-hand data, and the news source is a first-hand reporter, which is
only 11.5%. Therefore, we need to promote the enrichment and improvement of data news production
elements and accelerate the construction of data news teams. A reasonable data news production team
must have big data analysts, visualization designers, journalists, editors, and other key personnel. The
team personnel have their duties and play their dynamics under the same selected topic, which can
show more layers in the content and reap good results in visualization and communication.

2) The content of the report is utilitarian, and the ability to suddenly report is not enough;

In this paper, we found that 63.25% of the data news reports were focused on four aspects: “big data,
cities, young people, and consumption”, resulting in a single form of visualization. The reading rate
of data news on mobile is only 21.53%, which does not take advantage of mobile. Therefore, we need
to improve the professionalism and diversity of data news visualization, improve the analysis ability
of data news, and accelerate the matching degree with mobile. The richer the visualization expression,
the better the communication effect can be obtained.

References
[1] Bucher, T. (2016). Machines don't have instincts: Articulating the computational in journalism. New
Media & Society, 19(6), 919.
[2] Knight, M. (2015). Data journalism in the UK: A preliminary analysis of form and content. Journal of
Media Practice, 16(1), 55-72.
[3] Yamkovenko, S. (2013). Five Tips for Journalists Doing Data Visualization. The Quill, 101(1), 30-34.
[4] Boubaker, S., Liu, Z., & Zhai, L. (2021). Big data, news diversity and financial market crash.
Technological Forecasting and Social Change, 168, 120755.
[5] Park, D., Lee, H., & Jeong, S. H. (2022). Production and Correction of Misinformation About Fine Dust
in the Korean News Media: A Big Data Analysis of News From 2009 to 2019. American Behavioral
Scientist.
[6] Mena, P. (2021). Reducing misperceptions through news stories with data visualization: The role of
readers’ prior knowledge and prior beliefs. Journalism, 1464.
[7] Qi, E., Yang, X., & Wang, Z. (2019). Data mining and visualization of data-driven news in the era of big
data. Cluster Computing, 22(4), 10333-10346.
[8] Vuong, Q. H., Le, T. T., & Nguyen, M. H. (2022). Mindsponge mechanism: an information processing
conceptual framework. The Mindsponge and BMF Analytics for Innovative Thinking in Social Sciences
and Humanities, 21-46.
[9] Gadicha, A. B., Gadicha, V. B., Bohra, S., et al. (2022). Implementation Tools for Generating Statistical
Consequence Using Data Visualization Techniques. Advances in Data Science and Analytics: Concepts
and Paradigms, 1-20.
[10] Fawzi, N., Steindl, N., Obermaier, M., et al. (2021). Concepts, causes and consequences of trust in news
media – a literature review and framework. Annals of the International Communication Association, 45(2),
154-174.
[11] Graziano, L. M. (2018). News media and perceptions of police: a state-of-the-art review. Policing: An
International Journal.
Visualization communication mode and path optimization of data news in the context of big data 17

[12] Tully, M., Vraga, E. K., & Smithson, A. B. (2020). News media literacy, perceptions of bias, and
interpretation of news. Journalism, 21(2), 209-226.
[13] Martens, B., Aguiar, L., Gomez-Herrera, E., et al. (2018). The digital transformation of news media and
the rise of disinformation and fake news.
[14] Alsakran, J., Chen, Y., Luo, D., et al. (2012). Real-Time Visualization of Streaming Text with a Force-
Based Dynamic System. IEEE Computer Graphics & Applications, 32(1), 34-45.
[15] Chae, Y. H., Lee, C., Choi, M. K., & Seong, P. H. (2022). Evaluating attractiveness of cyberattack path
using resistance concept and PageRank algorithm. Annals of Nuclear Energy, 166, 108748.
[16] Zhao, C., Li, N., & Fang, D. (2018). Criticality assessment of urban interdependent lifeline systems using
a biased PageRank algorithm and a multilayer weighted directed network model. International Journal of
Critical Infrastructure Protection, 22(SEP.), 100-112.

You might also like