Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Journal Pre-proof

A hierarchical blockchain-enabled security-threat assessment architecture for IoV

Yuanni Liu, Ling Pan, Shanzhi Chen

PII: S2352-8648(22)00287-5
DOI: https://doi.org/10.1016/j.dcan.2022.12.019
Reference: DCAN 592

To appear in: Digital Communications and Networks

Received Date: 27 September 2021


Revised Date: 18 October 2022
Accepted Date: 28 December 2022

Please cite this article as: Y. Liu, L. Pan, S. Chen, A hierarchical blockchain-enabled security-
threat assessment architecture for IoV, Digital Communications and Networks (2023), doi: https://
doi.org/10.1016/j.dcan.2022.12.019.

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.

© 2022 Chongqing University of Posts and Telecommunications. Production and hosting by Elsevier
B.V. on behalf of KeAi Communications Co. Ltd.
Digital Communications and Networks(DCN)

journal homepage: www.elsevier.com/locate/dcan

A hierarchical blockchain-enabled security-threat assessment


architecture for IoV

Yuanni Liua , Ling Pana , Shanzhi Chen∗b


a School of Cyber Security and Information Law, Chongqing University of Posts and Telecommunications,
Chongqing 400065, China

of
b State Key Laboratory of Wireless Mobile Communications, China Academy of Telecommunication Technology,

Beijing 100191, China

ro
Abstract -p
In Internet of Vehicles (IoV), the security-threat information of various traffic elements can be exploited by hackers to
re
attack vehicles, resulting in accidents, privacy leakage. Consequently, it is necessary to establish security-threat assessment
architectures to evaluate risks of traffic elements by managing and sharing security-threat information. Unfortunately, most
assessment architectures process data in a centralized manner, causing delays in query services. To address this issue, in this
lP

paper, a Hierarchical Blockchain-enabled Security threat Assessment Architecture (HBSAA) is proposed, utilizing edge chains
and global chains to share data. In addition, data virtualization technology is introduced to manage multi-source heterogeneous
data, and a metadata association model based on attribute graph is designed to deal with complex data relationships. In order
na

to provide high-speed query service, the ant colony optimization of key nodes is designed, and the HBSAA prototype is also
developed and the performance is tested. Experimental results on the large-scale vulnerabilities data gathered from NVD
demonstrate that the HBSAA not only shields data heterogeneity, but also reduces service response time.
ur

© 20xx Published by Elsevier Ltd.


KEYWORDS:
Jo

Internet of Vehicles, Blockchain, Edge computing, Data virtualization, Data service composition

1. Introduction ically, a security-threat assessment system collects,


stores, analyzes security threat information, and pro-
With the progress of automatic driving technology, vides a unified interface for different applications like
various devices, such as GPS systems, authentica- web applications to access security threat information
tion servers, are applied in Internet of Vehicles (IoV), of equipment, including Common Vulerability Scor-
while these devices have a lot of security threat infor- ing System (CVSS) score, and attack template infor-
mation, causing serious security threat to IoV [1]. By mation. Based on this security-threat assessment sys-
exploiting security threat information, hackers can at- tem, the IoV can detect potential threats of various ele-
tack cars remotely, leading to security incidents, traf- ments early, and prevent malicious attacks by hackers.
fic accidents, information leakage, etc [2–4]. Conse- Therefore, it is of great significance to study security-
quently, it is necessary to construct a security-threat threat assessment architectures to protect the IoV se-
assessment system, which has received widespread curity.
attention in IoV security research [5]. The system However, traditional security-threat assessment sys-
not only integrates the security threat information of tems are centralized, and security threat information is
multi-source and heterogeneity, but also can provide collected by Road-Side Units (RSUs), and being pro-
security analysts with query and share services to dis- cessed in a remote cloud platform centrally [6]. Such
cover the potential threats of IoV timely. More specif- a centralized approach has two defects. Firstly, it is
vulnerable to single point of failures. For instance,
∗ Corresponding author: Shanzhi Chen (e-mail: Denial of Service (DoS) attacks on a central server
chensz@cict.com).
2 Yuanni Liu, et al.

may cause the security-threat assessment system to can provide a system with traceability, validity, and
crash, and fail to respond timely, leading to the in- the automatic execution of policy. Therefore, we try
ability to evaluate risks of IoV elements. Secondly, a to leverage EC and blockchain technology in the IoV
lot of security-threat information requests may gener- data sharing area, and introduce the Edge-Chain into
ate massive communication overhead, and increase the IoV framework. The goal is to construct a Hierar-
service response time, and lead to inefficient informa- chical Blockchain-enabled Security threat Assessment
tion sharing. In order to solve shortcomings of central- Architecture (HBSAA) to share security threat infor-
ized systems, a distributed security-threat assessment mation among all RSUs in IoV on the basis of sloving
system is needed. Specifically, the distributed system the single point of failure and communication delay.
utilizes a replica mechanism to persistently store the Nevertheless, in the design of a HBSAA, a diffi-
same security threat information on different nodes, cult problem is the multi-source heterogeneous data
avoiding single point of failures [7]. Nevertheless, it is management. The content, format and quality of
difficult to guarantee the data consistency of each node multi-source heterogeneous data vary greatly, which
in distributed system with the increase of data volume. increases the difficulty of data collection. Moreover,
In addition, the replica mechanism generates a lot of as a common database of blockchain, LevelDB [13]
redundant data, which will occupy storage space and exploits a structured format to store data, so that the
cause waste of system resources. Therefore, the cur- heterogeneous data must be transformed into struc-

of
rent security-threat assessment systems are not suit- tured data in data storing, but data loss may occur dur-
able for IoV with dynamic topology and scarce net- ing conversion. Fortunately, Data Virtualization (DV)
work resources. technology [14] shows advantages towards issues of

ro
To deal with this problem, many studies have intro- multi-source heterogeneous data management. The
duce blockchain into the data sharing model of IoV. DV technology provides users with a unified method
Blockchain is a distributed framework, in which the
consensus mechanism can ensure the consistency and
-p
to query and manipulate data, regardless of the data
location and format. In addition, a DV-based system
re
security of shared data [8]. For example, Fan et al. [9] only stores metadata instead of the entire source data,
proposed a blockchain-based data sharing model for to improve the speed of data processing effectively.
the IoV, using vehicles with limited computing power Unfortunately, we found that applying DV technology
lP

as full nodes to build a blockchain, but the cost is too to directly manage multi-source heterogeneous data is
high. To this end, RSUs with computing power much not an easy task. One major challenge is how to im-
larger than vehicles have been used in data sharing for prove the metadata query speed. Because of the so-
na

IoV. As the full node of the blockchain, all RSUs share phisticated relationships among security threat infor-
the same distributed ledger [10, 11], which has better mation, the traditional DV technology adopts a meta-
computing and storage capabilites. However, a large data association model [15], which must be traversed
ur

number of nodes coexist in the blockchain network, while querying data, leading to low query rate. An-
and a large number of devices are connected to the other key challenge is to improve the query efficiency
network and request corresponding network services, of data services. A service query request may consist
Jo

which may easily cause problems such as slow mes- of multiple data services, with a complex query pro-
sage processing speed, data transmission delay, and cess and poor query performance.
high bandwidth occupation. Edge Computing (EC) In this context, we ask the following question: In
can solve this problem in IoV. In our previous re- order to evaluate the potential risks of IoV devices
search, we have integrated blockchain and smart con- quickly, can we exploit existing DV technology to in-
tracts capabilities into the intelligent EC framework tegrate security threat information, based on the pre-
as the foundations [12]. The goal is to establish an viously proposed Edge-Chain IoV framework? We
intelligent and blockchain-embedded Edge-Chain en- answer the question optimistically by presenting a
abled framework to provide access control service for HBSAA. Specially, a Global-Chain is added to our
IoV devices. In our Edge-Chain IoV framework, EC HBSAA, which is deployed in the cloud data cen-
offloads the computing of access control service to ter, to provide efficient and speedy security assess-
RSUs, avoiding single point of failures and reducing ment services of smart terminals for our Edge-Chain
communication latency. Consequently, our previous IoV framework. Moreover, our HBSAA leverages
work can resolve problems existing in the security- DV technology to shield the heterogeneity of security-
threat assessment systems. Nevertheless, due to each threat information. In this paper, a Property Graph-
RSU node in different areas cannot share the behav- based Metadata Association Model (PGMAM) is also
ior data of vehicles, it needs to collect the behavior designed to deal with complex logical relationships
data repeatedly, which leads to a waste of network re- between data, which can improve the query perfor-
sources. Therefore, on this basis, the indroduction of mance of metadata. In addition, for the low service
blockchain to store and share data can slove the prob- query efficiency, we present an Ant Colony Optimiza-
lem of data sharing between cross-regional RSUs. Ad- tion of Key Nodes (KNACO). The contributions of our
ditionally, combined with smart contracts, blockchain work are summarized as follows:
A hierarchical blockchain-enabled security-threat assessment architecture for IoV 3

(1) A HBSAA is introduced to provide efficient se- other fields are worth learning [16]. Most of cur-
curity threat assessment services for our Edge-Chain rent security-threat assessment systems are based on
IoV framework. In HBSAA, the blockchain architec- the centralized architecture and distributed architec-
ture consists of multiple Edge-Chains and a Global- ture. In a centralized architecture, data management
Chain. The Edge-Chain is composed of several RSUs is performed in a single central module. For instance,
within a certain range, sharing security threat infor- Sarath et.al [17] have proposed an architecture to de-
mation between RSUs in the area. The Global-Chain velop an Automated Teller Machine (ATM) security
consists of multiple cloud data centers, sharing secu- system, which utilizes a wide variety of sensors to cap-
rity threat informatin for RSUs in different regions. In ture security related information, and sends to a central
addition, RSUs act an authority of security assessment server for further process, analysis and data presenta-
to evaluate potential risks of IoV devices based on se- tion. Rahman et. al [18] have constructed a central-
curity threat information. Therefore, this architecture ized vulnerability database, which provides a platform
can slove the storage pressure and computing pressure for gathering, storing, and conveying vulnerability in-
on a single node, and avoid the single point of fail- formation, making it easy for system administrators to
ure and high communication overhead. Additionally, access and analyze. However, like most centralized
it can also improve the scalability of the system. architectures, these solutions inherit the disadvantage
(2) In HBSAA, DV technology is introduced, which of single-point failures [19]. In a distributed architec-

of
exploits metadata as indexes to integrate multi-source ture, data management is often assigned to multiple
heterogeneous data, so that they may be accessed nodes to complete. For example, [20] have presented
without regard to their physical storage or heteroge- a distributed analysis system for vulnerability priori-

ro
neous structure. In addition, for the problem of low tization, which operates in the scalable, containerized
query speed caused by complex relationships among environment. The system integrates the collected vul-
data, PGMAM is proposed to build a property graph
model of metadata, based on relationships among var-
-p
nerability data with data from organizational inventory
database to detect the scan results in real time. Nev-
re
ious security threat information. In order to improve ertheless, the current distributed architecture often ex-
the service query rate, KNACO is presented to con- ploits a replica mechanism, which will utilize redun-
vert the problem of data service composition into a dant storage space to store data, causing a waste of re-
lP

Steiner tree problem. The KNACO selects the smallest sources [21]. In addition, for distributed architectures,
Steiner tree by calculating the relative actual weight of it is a major challenge to maintain data consistency
the tree, and the minimum Steiner tree is the optimal across nodes [22].
na

service composition. The blockchain has gained enormous attention for


(3) A HBSAA prototype is implemented and the decentralized data management in IoV [23, 24]. Con-
performance of HBSAA is evaluated. The response sidering the distributed and secure storage of big data
ur

time of querying metadata under different data vol- in IoV, Jiang et al. [25] have proposed a blockchain
umes is compared utilizing the PGMAM, MySQL and architecture that describes how nodes in IoV could
tree model. Experimental results illustrate that the PG- participate in blockchain. They also have analyzed
Jo

MAM has more remarkable query performance as in- the data transmission performance. Recently, Edge-
creasing volume of data. Moreover, to verify the ef- Chain, as an extension of blockchain, has attracted
ficiency of KNACO, in the case of different number more attention [26], which can push more computing
of service composition nodes, The actual weights of and storage resources to the edge, reducing commu-
the minimum Steiner tree generated by KNACO are nication delays. For example, Cui et al. [27] have
compared with the basic ant colony algorithm, and it presented a CUTE computation platform, which pro-
is observed that KNACO can better select the optimal vides low-latency computation services for IoV by us-
service combination. ing Edge-Chain. Lahiri et al. [28] have proposed a
The rest of this paper is organized as follows. In framework based on blockchain and smart contracts
Section 2, the related work is discussed. In Section 3, to store messages and effectively validate trustworthi-
the HBSAA is described. In Section 4, the PGMAM ness. Moreover, this framework treats messages as
is described. In Section 5, the optimization strategy transactions and validates the trustfulness of messages
of data service composition is described, the KNACO by using edge computing based on geographical lo-
is formulated and the related algorithm is explained. cations and Internet of Things (IoT) implementation.
In Section 6, the experimental results of the HBSAA Despite the application of Edge-Chain in IoV archi-
prototype is presented and analyzed. In Section 7, the tecture, to the best of our knowledge, few people have
paper is finally concluded. considered the problem of multi-source heterogeneous
data management in this architecture.
Traditional multi-source heterogeneous data man-
2. Related works agement methods include ontology-based method [29]
and middleware method [30]. For instance, Alvarez-
Because of the lack of research on security-threat Coello et al. [31] have presented an ontology-based
assessment architecture for IoV, many researches in
4 Yuanni Liu, et al.

approach to integrate vehicle-related data, which con- cess of security threat information. Finally, the design
structs an ontology to define semantic of vehicle data, issues of HBSAA will be explained.
so as to realize the query of vehicle data across data
sources. However, these methods are inefficient in 3.1. The overview of HBSAA in edge-chain IoV frame-
data processing, and they also have data errors dur- work
ing the format conversion [32, 33]. DV technology
is the process of aggregating data from different in- In order to realize the cross-origin sharing of secu-
formation sources to develop a single, logical and vir- rity threat information in Edge-Chain IoV framework,
tual view of information so that users or applications the HBSAA is presented in this paper, as shown in
can access it without knowing the exact storage loca- Fig. 1. The HBSAA consists of three layers, which
tion and format of the data, and can provide a new are perception layer, edge layer, and cloud data cen-
reference for the management of heterogeneous data ter. The perception layer comprises various smart ter-
from multiple sources [34]. Aleyasen et al. [35] have minals, such as vehicles, cameras, and signal lights,
leveraged Hyper-Q’s adaptive DV platform to inte- etc. To be specific, smart terminals collect the basic in-
grate data, which can run on various databases trans- formation by their equipped sensors, including device
parently and solve the risk of data migration. Mu- information, operating system information, etc., and
niswamaiah et al. [36] have outlined DV technology then the collected information is uploaded to the edge

of
used for Business Intelligence (BI), which can solve layer. The edge layer is composed of numerous RSUs,
BI problems that needs to collect and analyze business and each RSU leverages a local data management sys-
data in real time to make better decisions. However,

ro
tem to evaluate several risks for smart terminals, e.g.,
the DV technology has two problems to be solved, that equipment risk assessment, operating system risk as-
is, metadata management and efficient data query ser- sessment. Moreover, the edge layer contains several
vice [37].
In DV technology, multi-source heterogeneous data
-p
Edge-Chains in accordance with the variety of loca-
tions and communication ranges of RSUs. The cloud
re
is managed by metadata [38], which is the index of data center is formed by different cloud servers, which
source data. Therefore, effective metadata manage- utilizes a global-data management system to supply
ment is the key to ensuring the performance of data security threat information for all RSUs in IoV. This
lP

query, storage, and operation. At present, there are means that IoV can share security-threat information
many researches to improve the query speed of meta- among RSUs in different regions through the cloud
data [36, 39]. To improve the efficiency of massive data center, and avoid duplicate data collection by
na

data retrieval, Fan et al. [40] have proposed a data or- RSUs, thereby improving the efficiency of security
ganization approach, which takes the logical segmen- threat assessment. Meanwhile, the cloud data cen-
tation indexing code as the identifier of each remote ter integrates one Global-Chain, which enables cloud
ur

sensing data. Moreover, in order to facilitate users to servers to be deployed in a distributed way.
query data quickly, the DV technology provides users
with data query services through a single data access
Jo

3.1.1. The local data management system


interface [37]. However, a data service requested by
users is composed of multiple data services in gen- As shown in Fig. 1(b), the local data management
eral. Therefore, it is necessary to implement an effi- system comprises a local data virtualization compo-
cient data service [41]. The approaches of data service nent and a copy of Edge-Chain, which is responsible
optimization include two aspects: QoS-based service for evaluating various risk situations of smart termi-
composition optimization [42] and graph model-based nals in RSU communication range, such as informa-
service composition optimization [43]. In [44], au- tion security risk, identity risk, equipment vulnerabil-
thors have proposed a method of combining Skyline ity.
with an ant colony algorithm to tackle the difficulty (1) The local data virtualization component
of service composition, which improves the efficiency The local data virtualization component includes
and QoS composition. Wang et al. [45] have presented a source data module, a local metadata organization
an original method that uses an extended version of module, a local data service module, and a data visu-
the classical graph-plan and backward A* search al- alization module. The fundamental function is to in-
gorithm, accelerating the search process of services. tegrate security threat information of multi-source and
heterogeneity via DV technology.
The source data module gathers original data, from
3. HBSAA smart terminals, the Common Vulnerabilities and Ex-
posures (CVE) database [46], and the National Vul-
In this section, the overall HBSAA will be dis- nerability Database (NVD) [47]. Furthermore, Table
cussed first. Next, the functions of the local data man- 1 lists the detailed description of collected data.
agement system and global data management system Similarly, the local metadata organization module
in HBSAA will be introduced, as well as the genera- is used to collect and process metadata by monitor-
tion process of data blocks and the data sharing pro- ing the source data module in real time. Specifically,
A hierarchical blockchain-enabled security-threat assessment architecture for IoV 5

Data Processing Flow δeεGlobal-Chain Structure


Cloud Data Center Data Request and Response Flow Block Header Block Header
Smart Contracts
Cloud Data Cloud Data δdεGlobal Data Management System
TXs
Center1 ... Centern-1
Global-Chain Merkle Root
ƿA Cache
Hash 12 Hash n(n-1) Optimization
Smart Constracts (Execution) TXs Logging Query Strategy
Smart Contract Interfaces (Mining,Consensus)
(Min Hash 1 Hash 2 Hash n-1 Hash n
(Service COmposition OPtimization
Global Metadata TX1(Global TX2(Global TXn-1(Global TXn-1(Global
Strategy)
Cloud Data Metadata) Metadata) Metadata) Metadata)
Cloud Data Cloud Data Centern
Center2 Center3
... δcεEdge-Chain Structure
Global Metadata
Global Data Service Storage Submodule Block Header Block Header
Processing Submodule Smart Contracts
Global Metadata TXs
Processing Submodule
Global Data Service Merkle Root
ƿA Cache
Cache Submodule Hash 12 Hash n(n-1) Optimization
Global Metadata Query Strategy
Global Data Service Module Hash 1 Hash 2 Hash n-1 Hash n
Cloud Server RSUR Extraction Submodule
RSU1 TX1(Local TX2(Local TXn-1(Local TXn-1(Local
Global Data Processing Global Metadata
Metadata) Metadata) Metadata) Metadata)
Component Organization Module

RSU2 RSUR-1

Local Data Virtualization Component Local Data Service Module Edge-Chain

Data Visualization Module Local Data Service


Smart Constracts
... ... ... ...

SQL Interfaces
Cache Submodule
(Execution)
RSU3 Vulnerability/attack
RSUR-2 Smart Contract
Vehicle-to-Vehicle (V2V ) Template Information Query Interfaces
Local Data Service
communication Processing Submodule (A Cache Optimization
Vehicle-to-InfraStructure (V2I ) Device Vulnerability Query Strategy)
communication Analysis
Vehicle-to-RSU (V2R ) Local Metadata Storage
communication Submodule TXs
Infrastructure-to-RSU (I2R ) (mining,consensus)
Source Data Module
communication
RSU-to-RSU (R2R ) Intelligent Terminal Official Statistics Local Metadata
Local Metadata
communication Processing Submodule
RSU-to-Cloud (R2C ) Device Data CPE Data Vulnerability Data

of
communication
Cloud-to-Cloud (C2C ) Local Metadata
communication Extraction Submodule
SQL Interfaces File URI
Smart Contracts Local Metadata
Organization Module
Trade Information

ro
δaεService Framework of Security Threat for IoV δbεLocal Data Management System

Fig. 1: The hierarchical blockchain-enabled security-threat assessment architecture

Name Format Data item


-p module is responsible for processing service requests,
re
and interacting with the local metadata organization
ID, name, category, module to obtain source data and generate data views.
Device data MySQL brand, description, latest
lP

The purpose of the local data service cache submodule


revision time is to store data views that users have visited into the
ID, name, version, de- Edge-Chain. When a user requests the same service
Operation
na

CSV scription, latest revision next time, data views can be obtained directly from
system data
time the Edge-Chain, enhancing the query performance of
security threat information.
ID, name, category, ver-
Protocol
ur

The data visualization module adopts statistical


MySQL sion, brand, description,
stack data charts, including pie charts, tables, etc., to display risk
latest revision time
assessment results for users.
Jo

ID, softwarename, latest (2) Edge-Chain for local data sharing


CPE data XML
revision time The key components of Edge-Chain are blockchain
ID, NVD official ad- and smart contracts [48], as shown in Fig. 1(c). The
dress, vulnerability edge layer contains multiple Edge-Chains that are di-
CSV/ description, related vided relying on the different locations and communi-
Vulnerability XML/ resource link, status de- cation ranges of RSUs. In particular, RSUs in the iden-
data HTML/ scription, pre-condition, tical Edge-Chain maintain the same "ledger" to realize
TXT post-condition, CVSS data sharing in a local area.
value, latest revision The blockchain stores metadata of security threat
time information, which is collected by RSUs in a local
area. Specially, this paper takes advantage of a graph
Table. 1: IoV. Security Threat Information database Neo4j [49] as a blockchain component to
store metadata, rather than storing metadata in two-
dimensional tables or tree models. Neo4j is designed
while the source data module has new data, the lo- to store nodes, edges, attributes of graph model, and
cal metadata organization module extracts and cleans it supports massive data relational operations. More-
metadata immediately, and then stores the processed over, the relationship is the most important element
metadata in the Edge-Chain. in the Neo4j, which represents the interconnection be-
The local data service module is composed of a lo- tween nodes. This enables Neo4j to query the node
cal data service process submodule and a local data information through relationships quickly. In addition,
service cache submodule, to provide users with a va- in order to share data among RSUs in a local area, we
riety of query services about security threat informa- utilize Practical Byzantine Fault Tolerance (PBFT) as
tion. Among them, the local data service process sub- a consensus protocol to ensure the metadata consis-
6 Yuanni Liu, et al.

tency of each node, consistent with our previous work (1) RSU will collect original data from various data
[12]. sources according to the basic information submitted
The smart contract supports automatic execution of by smart terminals. After the metadata is extracted and
cache optimization query strategy. Firstly, the local processed by the local metadata organization mod-
data service process submodule finds source data sets ule, the metadata is stored in the local record pool in
based on metadata. Secondly, according to the depen- chronological order. When the information stored in
dencies among services, this submodule constructs the the record pool is enough to fill the entire block, the
source data sets into a connected directed graph, that system packs the data into blocks.
is, a data service composition graph. In addition, the (2) In order to get the most out of the blockchain
generation of a service composite graph is the trigger network. The metadata of the security threat informa-
condition of smart contracts. Consequently, when the tion will be stored in the block, while the complete
local data service process submodule generates a ser- raw data will be stored in the RSU to ensure the trace-
vice composite graph, it will trigger smart contracts ability and tamper-resistance of the information. The
embedded with cache optimization query strategy to block header contains the hash value and timestamp
select the optimal service composition. Furthermore, of the previous block, while the hash of the block
the local data service process submodule extracts the body is obtained from the entire data. Additionally,
source data based on the optimal service composition, the information in the block header can be used for

of
improving the query efficiency. data queries. After the block is constructed, the mas-
ter node broadcasts it to the whole network. The sys-
tem then waits for the implementation of the consen-

ro
3.1.2. The global data management system
As shown in Fig. 1(d), the global data management sus process.
system includes a global data process component and (3) PBFT [50] algorithm has low consensus de-
a copy of the entire Global-Chain. This system is used
to manage all metadata of security threat information
-p
lay and high efficiency, which can meet the needs of
real-time processing and large-scale communication.
re
to share information among RSUs in various regions. Therefore, the PBFT is adopted for block consensus.
(1) The global data process component The specific consensus process is as follows: Firstly,
The global data process component stores security the master node broadcasts the data block to the whole
lP

threat information and processes data requests from network. Furthermore, the consensus node verifies the
RSUs, which is composed of a global metadata organi- validity of the block and broadcasts its verification re-
zation module and a global data service module. Par- sults and signatures to each other in a distributed man-
na

ticularly, the global metadata organization module is ner. Secondly, after each node receives the verification
responsible for collecting and processing metadata in result, it compares it with other nodes and feeds it back
the network. Specifically, the global metadata organi- to the master node, including the verification result,
ur

zation receives metadata uploaded by RSUs in each re- comparison result, signature and received verification
gion, and it also cleans duplicate metadata, and stores result. Finally, if two-thirds of the nodes agree on
the data block, the block is stored on the blockchain.
Jo

the processed metadata in Global-Chain. The function


of the global data service module is similar to that of Among them, the full node RSU stores the complete
the local data service module. information of the block.
(2) Global-Chain for data sharing across the net- (4) When the data block reaches a consensus
work through the consensus stage, the RSU node broadcasts
As shown in Fig. 1(e), the Global-Chain consists it to the whole network. Moreover, it uses a hash
of the blockchain and smart contracts to perform two pointer to link the newly generated block to the end
functions. Firstly, in order to avoid single point of fail- of the blockchain. This block becomes the last block
ures, the blockchain is responsible for distributed stor- of the blockchain network.
age of metadata in the cloud data center. Secondly, In summary, an Edge-Chain block is generated from
the smart contract of Global-Chain also introduces a steps (1) to (4). The data block generation process of
cache optimization query strategy, which has the same the Global-Chain is similar to the above.
function as Edge-Chain.
3.2.2. Data sharing process
3.2. The data block generation process and data shar-
In our proposed architecture, a smart terminal sends
ing process in edge-chain IoV framework
an access request to nearby RSU, and submits the ba-
3.2.1. The generation process of the data block sic information at the same time, including the de-
In the HBSAA, RSU, as the full node of the Edge- vice name, the device type, etc. The RSU will find
Chain, participates in the generation process of data metadata related to the terminal in the local data man-
blocks through the metadata storage submodule, in- agement system based on the received information. If
cluding four stages: data collection, block construc- the requested metadata exists, the RSU obtains corre-
tion, consensus process, and block generation. The sponding security threat information relying on meta-
detailed description is as follows: data, and then evaluates risks of the smart terminal.
A hierarchical blockchain-enabled security-threat assessment architecture for IoV 7

Otherwise, the RSU sends data requests to the cloud information in the local region, it will send a query re-
data center. The cloud data center searches the re- quest to the cloud data center. Furthermore, the search
quired metadata from the global data management sys- process of security threat information in the cloud data
tem. Assuming that metadata exists, according to the center is the same as the above process.
metadata, the cloud data center finds the RSU that (7) If neither the local data management system
stores the source data. Moreover, the cloud data cen- nor the global data management system stores security
ter obtains security threat information from the stored threat information of smart terminal, RSU will collect
RSU, and returns the information to the requested security threat information by the local data manage-
RSU. On the contrary, the RSU will acquire security ment system to evaluate risks of smart terminal.
threat information of the smart terminal if the cloud
data center does not find out metadata. The detailed 3.3. Design issues
query process is shown in Fig.2, as follows: When utilizing DV technology to manage multi-
source heterogeneous data in HBSAA, there are two
Start
basic factors that affect the performance of data query,
Input Request which will be discussed in the following.
Parse the request in the local
data proscessing submodule
Find metadata from the local
metadata storage submodule Find the candidate source
Data source 1

of
dataset in the local RSU
No Yes
Is there a service Whether the
view in the local data cache Trigger the smart contract
corresponding metadata can
submodule be found
Generate optimal service
...

ro
composition, and extract data

Combine into a data view


Yes and cache the view
No Data source 2 Data source 3

Yes
Parse the request in the global
data proscessing submodule

Is there a service
view in the global data cache
Find the corresponding
source data in other RSU
-p ... ...
re
submodule Trigger the smart contract

No Generate optimal service


Whether the Yes composition, and extract data
Find metadata from the global
metadata storage submodule
corresponding metadata can Data source n-m Data source n
lP

be found
Combine into a data view
No and cache the view
...
Return the view to the
requester
Collect requested information
of security threat
... ...
na

End

Fig. 3: Metadata organization model of tree-based structure


Fig. 2: The entire search process of security threat information
ur

(1) The local data service process submodule parses Target service
the search request, which contains the basic informa- E F
Jo

tion of smart terminal to be evaluated. Target subservice

(2) The local data service process submodule deter-


mines whether the local data service cache submodule DS1 DS2
C
has a data view requested by RSU. If the data view ex- C D
A
A
ists, it will be returned to the local data service process
B B
submodule.
(3) It is supposed that there is not the data view in
the local data service cache submodule, the local data Fig. 4: The data service composition graph
service process submodule will search the correspond-
ing metadata from the local metadata storage submod-
ule. 3.3.1. The impact of metadata storage on query per-
(4) The local data service process submodule ob- formance
tains the alternative source data sets based on the Since DV technology is based on metadata to man-
searched metadata, and constructs a data service com- age multi-source heterogeneous data, a key factor that
position graph. Meanwhile, this submodule triggers determines the query performance of HBSAA is the
smart contracts, and then automatically select the op- manner of metadata storage and query. Normally,
timal service composition. when relationships among information are complex,
(5) According to the optimal service composition, a tree model is used to store metadata, as shown in
the local data service process submodule extracts data Fig. 3. However, it is necessary to traverse the en-
from the source data. In addition, this submodule gen- tire tree from the bottom layer while querying data,
erates a data view based on the extracted data, which resulting in long data query time. Also, as the amount
will be called by the data visualization module. of data increases, the tree model gets larger, which in
(6) Assuming that RSU does not find security threat turn increases query response time. On the contrary,
8 Yuanni Liu, et al.

if the relationship is used as a query index, only the Therefore, the properties can be represented as A s =
storage nodes that meet the index requirements will {a s1 , a s2 , · · · a si }.
be traversed to find the requested data, which can sig- (3) The Relationship definition
nificantly improve the data query rate. In summary, in In the PGMAM, a relationship between different
order to promote the query performance of HBSAA, it data is described by an edge. Consequently, all re-
is of great significance to find a new metadata storage lationships can be defined as a set of edges E : E =
method to establish relationships among data. {eux , eati t }, where eux represents relationship between
metadata u and metadata x (u, x ∈ M, u , x), and
3.3.2. The impact of data service composition opti- eati t represents relationship between metadata t and the
mization on query performance property ati . The relationships among different data
A query service may consist of multiple target sub- are defined as eDO , eS R , eaS i S , eRC , eaRi R , eaCi C , which
services, hence how to choose target sub-services to are denoted as follows:
form the optimal service composition is another im- eDO : represents an inclusion relationship between
portant factor that affects the query performance of metadata D of different source data sets and metadata
HBSAA. As shown in Fig. 4, service E is composed O. In particular, O is one of device metadata, CPE
of target sub-services A, B, and C. In the subsequent metadata, and vulnerability metadata. For example,
query of service F, even if some target sub-services eDS is an inclusion relationship between metadata D

of
are the same as the target sub-services in service E, and device metadata S .
service composition must be reconstructed, which will eS R : represents a mapping relationship between de-
result in a lower query rate of HBSAA. In conclusion, vice metadata S and CPE metadata R.

ro
to implement query optimization, it is vital to design eaS i S : represents a relationship between device
a service composition optimization strategy based on metadata S and property aS i .
cache services. -p eRC : represents an inclusion relationship between
CPE metadata R and vulnerability metadata C.
re
4. A metadata association model based on prop- eaRi R : represents a relationship between CPE meta-
erty graph for security threat information data R and property aRi .
eaCi C : represents a relationship between vulnerabil-
lP

In this section, aiming at the effect of metadata stor- ity metadata C and property aCi .
age on query rate, which described in the section 3, the
proposed method PGMAM will be introduced. 4.2. The PGMAM construction
na

4.1. Related definitions An important step of building PGMAM is to asso-


ciate various information. This paper defines the meta-
To process various security-threat information in
data of security threat information as nodes. Further-
ur

IoV, PGMAM is built. The property graph [51] is a di-


more, an edge is defined as a relationship between two
rected graph composed of nodes, relationships, prop-
nodes. The implication of nodes and edges are de-
erties, and labels. Moreover, each node or relationship
Jo

scribed in Section 4.1 .


can set one or more properties which are stored in the
form of key-value pairs. Description
Pos-
condition
CVSS Description
Pos-
Pre-

(1) The Node definition CVSS


Pre-
condition eaCi C eaCi C eaCi C
eaCi C
condition
eaCi C e
aCi C
condition

eaCi C
CVSS
eaCi C eaCi C
In the PGMAM, the metadata of security threat in- CVSS
Pos-
condition eaCi C
CVE 2
CVE 1

eRC
Software
Name
CVE 2 eaCi C Pre-
condition

eaRi R Software
eRC eRC
formation is depicted by nodes. Consequently, the Pre-
condition eaCi C
Description
eaCi C
CPE 1
Brand
eaRi R
eRC
Name
eaCi C
Description

Pos-
Device ID CPE 1 CVE 1 eaCi C
different metadata can be defined as a set of nodes Pos-
condition
eaCi C
Software
Name
eaRi R
CPE 2
ea Si S eSR ea Si S
eSR
Device ID eaCi C eaCi C
condition

CVSS

M : M = {D, S , R, C}, and the symbols are defined


eaCi C eRC eSR ea Si S
Radar 1 Pre-
Description eaCi C CVE 1 Brand Radar 2 ea Si S Brand condition Software
... Name Pre-
eDS eaRi R condition
ea Si S eDS
as follows: CVSS

eaCi C
Software
Name
Device ID ea Si S
Servers eDS IoV
eDS
eDS
eSR
CPE 1 eaCi C
eRC eaCi C Description
eaRi R
D: represents metadata sets of various security Description
eaCi C
eRC
CPE 1
eSR
Camera
eDS eDS
GPS
ea Si S
ea Si S
Device ID
CVE 1
eaCi C Pos-
ea Si S Bluetooth

threat information. In particular, the metadata refers Pre-


condition
eaCi C
eaCi C
CVE 1

Software
Brand
eSR ea Si S ea Si S
Audio

ea Si S
eSR Brand
Pre-
condition
eaCi C condition

eaRi R eaCi C CVSS

to characteristic data extracted from source data, in- Pos-


condition CVSS
eaCi C
Name

eRC
CPE 1 Device ID Brand
Device ID
CPE 1
eRC
CVE 2
CVE 2
eaCi C
eRC eRC
cluding the storage location, type, and so on. Description eaCi C
eaCi C ea C
Ci eaCi C CVE 1
Pre-
condition
eaCi C
CVE 1
eaRi R
eaCi C
eaCi C
Description

Software
Description eaCi C
S : represents a piece of device metadata. Pre-
condition Pos-
condition
eaCi C eaCi C eaCi C
CVSS
eaCi C eaCi C
Name
CVSS
Pos-
condition

R: represents a piece of Common Platform Enumer- Pos-


condition CVSS
Pre-
condition Pos-
condition
Description

ation (CPE) metadata.


C: represents a piece of vulnerability metadata. Fig. 5: The example of PGMAM
(2) The Property definition
In the PGMAM, each metadata has a property Fig. 5 shows a simple example of PGMAM. As
set. A property set can be defined as At : At = can be seen from this figure, the nodes can be divided
{at1 , at2 , · · · ati } , where t ∈ M and ati (i ≥ 1 and is into four categories: D, S , R and C. To be specific,
an integer) represents metadata properties. For in- D represents various metadata sets of IoV. S is device
stance, the properties of device metadata have the stor- metadata, such as Rodar, GPS, Bluetooth Audio, etc,
age location of source data, ID, device name, etc. and the properties of S include device ID and brand. R
A hierarchical blockchain-enabled security-threat assessment architecture for IoV 9

is CPE metadata associated with various device meta- 5. Data service composition optimization strategy
data, and the property of R contains software name. based on KNACO
C is vulnerability metadata, which is included in each
CPE metadata, and the properties of C have CVSS, In this section, the solution for the service compo-
pre-condition, pos-condition, description, etc. In ad- sition optimization issue is discussed, which proposed
dition, device metadata S and vulnerability metadata in the section 3.
C are mapped through CPE metadata R. Furthermore,
the implication of edges is the same as above. 5.1. Related definitions
In practice, the graph database Neo4j is deployed to Data service is an encapsulation of data resources,
store the PGMAM. Specifically, each node and edge in which can prevent users from accessing physical data
PGMAM correspond to the node and edge in Neo4j, sources directly. In this paper, a data service has mul-
and the metadata property is stored in the form of key- tiple attributes, which are defined as follows.
value pairs. The detailed metadata storage flow is as (1) AS : represents complete and indivisible data
follows: service, called atomic data service. This paper de-
(1) An input metadata set is defined as T = fines atomic data service as a four-tuple: AS =<
{t1 , t2 , · · · , tZ } (Z ≥ 1 and is an integer), and each id, label, properties, relationships > , where id repre-
metadata tq (1 ≤ q ≤ Z) corresponds to property sents the identifier of AS , and AS id are different data

of
set Atq = {Atq1 , Atq2 , · · · , Atqn } (n ≥ 1 and is an in- services; The label represents the atomic data service
teger). Moreover, metadata tq has an edge set Etq = type; The properties denote sets of property contained
{etq jb , etq atqn }( jb ∈ M).

ro
in the atomic data service; The relationships repre-
(2) Supposing that metadata tq is determined to exist sent an inclusion relationship among data service and
in Neo4j, the data storage will be terminated. Other- atomic data services.
wise, metadata tq and property set Atq will be stored in
Neo4j.
-p (2) CS : represents the data service that cached
after data consumers access, named cache ser-
re
(3) To build a relationship between metadata tq and vice. In this study, a cache service is re-
metadata jb (1 ≤ b ≤ S ), it is necessary to determine garded as a new data service, hence a cache ser-
whether metadata jb (1 ≤ b ≤ S ) exists. If metadata vice can be represented as a four-tuple: CS =<
lP

jb exists, the edge etq jb between metadata tq and meta- id, label, properties, relationships >. Furthermore,
data jb is stored in Neo4j. Otherwise, the storage of the CS has the same meaning as AS .
relationship is terminated. (3) DS : represents a relationship between two data
na

(4) To establish a relationship between metadata tq services. Specifically, there are two relationships in
and property atqn , it is significant to determine whether this paper. One is a parallel relationship, which is the
property atqn exists. Supposing that property atqn ex- link between two or more atomic services. The other
ur

ists, the edge etq atqn between metadata tq and property is an inclusion relationship, which means that one data
atqn will be stored in Neo4j. Otherwise, the property service is a subset of another data service.
atqn is created, and then edge etq atqn is constructed.
Jo

According to the property of metadata, the PG- 5.2. Methodology


MAM divides the original data organization model
In this paper, in order to select cache services in
into corresponding areas, which can build partition
a service composition preferentially, we propose an
indexes for metadata. The partition search method
optimization strategy of data service composition
can reduce the search space and avoid global brute
based on KNACO, among them, the KNACO is
force search, thereby reducing query response time.
composed of a basic ant colony optimization [52]
In particular, while querying a piece of metadata, it
and a Key node Based Minimum cost Path Heuristic
can search property nodes of metadata, and then ob-
(KBMPH) algorithm [53]. In addition, the cache op-
tain metadata through the area associated with prop-
timization strategy based on KNACO can be divided
erty nodes. Furthermore, the PGMAM builds connec-
into two steps. Firstly, the process of a data service
tions between metadata based on the unique relation-
composition is modeled as a data service composition
ship between data. The relationship between meta-
graph. Secondly, an optimal data service composition
data is used as a relational index, which further re-
is obtained by using KNACO.
duces the search space, thereby improving the query
performance of the system. Especially in the process
of data service, the multiple relational indexes built be-
tween metadata make closely related data close to each 5.2.1. Construction of a data dervice composition
other. In conclusion, the PGMAM takes relationships graph
among metadata as the query index and uses the parti- The data service composition graph is a directed
tion search method to improve the query efficiency of weighted connected graph, which can be defined as
HBSAA. a triple G =< V, E ′ , W >, as shown in Fig. 6, where
V, E ′ and W are node set, edges, and weights, respec-
tively.
10 Yuanni Liu, et al.

When looking for the minimum Steiner tree based


on KNACO, the initial locations of multiple ants are
used as target atomic service nodes. In general, key
nodes refer to nodes contained by the shortest path
which ants pass. After key nodes are determined, the
path with the lowest query cost is selected by ants ac-
cording to the transition probability. Moreover, when
all ants complete the path selection (one iteration is
completed), pheromones are updated, and then the
next iteration is started until the preset number of it-
erations is reached.
(1) The key node selection
By analyzing the shortest path in a data service
composition graph, it can be found that the shortest
path usually contains more CS S . Therefore, we ex-
Fig. 6: Data service composition graph
ploit the cache service nodes as key nodes in KNACO.
The criticality of nodes affects the location choice

of
of ants. The more critical node is, and the more ants
V = {v1 , v2 , · · · , vm } (m ≥ 0 and is an integer) is
will pass by. In this study, the criticality of a node
a node set in G, and nodes can be divided into three

ro
is determined by the number of AS , which have an
categories:
inclusion relationship with CS .
(1) The starting point corresponds to an input re-
(2) The transition probability
quest.
(2) The end point is a target data service.
(3) The remaining nodes represent CS or AS .
-p Target AS nodes are regarded as leaf nodes, which
is denoted as v s (s= 1,2,...,n), and k ants are placed on
re
each leaf node. Meanwhile, the input request node is
E ′ = {e′1 , e′2 , · · · , e′m } is an edge set in graph G, and considered as a root node, which is denoted as v∗ . Dur-
edges can be divided into two types: ing each iteration, each ant generates a branch from v s
lP

(1) An edge begins with a starting node and ends at to v∗ . Therefore, k trees will be formed after one itera-
any data services, which represents a data service that tion, and ants will select the minimum-weight tree in k
can be accessed directly according to an input request. trees. Moreover, pheromones are updated after one it-
na

(2) An edge represents the relationship between eration, and then the next iteration is initiate, until the
data services. preset number of iterations is reached.
W = {w1 , w2 , · · · , wm } is a weight set in G , and During the nth iteration, the current position of the
ur

W represents the cost of accessing a data service. In ε ant Ans (ε) is g from the S th target atomic data ser-
th

particular, when calculating the access cost of cache vice node, the transition probability to the next posi-
services, the access cost of atomic services does not
Jo

tion h is shown in (1):


need to be calculated, which is contained in the cache
service.
|τgh (n)|α |ηgh |β

, h < b sp (ε), h ∈ Bg


|τgr (n)|α |ηgr |β

 P
s
(ε, n) = 

Pg,h

s (ε),r∈B
5.2.2. Selection of the optimal data service composi- 


r<b p g

tion 0,
 else
To select data service composition with the lowest (1)
access costs, we convert the issue of choosing an opti- The variables are defined as follows:
mal data service composition to a problem of finding τgh (n): the pheromone of edge (g, h) at the nth iter-
the smallest Steiner tree [54]. Furthermore, the small- ation, and ηgh is the attraction of the node h to node
est Steiner tree meets the following conditions: g.
b sp (ε): the branch through which ant Ans (ε) passes
• A target atomic service means all atomic services and Bg represents the set of candidate nodes of g.
that have a direct parallel relationship with the α: the correlation between the transition probability
target data service. A Steiner tree is a directed and τgh (n).
tree, which takes the input request node as a root β: the correlation between the transition probability
node, and the target atomic service node as a leaf and ηgh .
node. The values of α and β are constants, which are de-
fined during algorithm initialization.
• The actual weight of edges contained in the Normally, ηgh = w1gh , where wgh is the weight
Steiner tree is the lowest. Specifically, if an edge of edge (g, h). Specially, this paper mainly cor-
contains data service nodes, only the weight of rects the transition probability by modifying ηgh to
edge in parallel with nodes is calculated. preferentially-selected key nodes. Moreover, the set
A hierarchical blockchain-enabled security-threat assessment architecture for IoV 11

of key nodes is defined as Q, and the criticality is de- Algorithm 1 KNACO


fined as λ(λ ≥ 1, λ ∈ Z). The modified ηgh is shown in Input: Data service network G =< v, e′ , w >, cache
(2): service set Q =< node, weight >, target task set T =
{t1 , t2 , · · · , t M }, input node Inpu.
Output: minimum Steiner spanning tree Min(T ) and
wgh ,
 1

 h<Q
its actual weight Min(Ct′F(ε) ).


ηgh = λ·wgh , h ∈ Q, 0 ≤ β ≤ 1
 1

(2)
 1: Initialization: pheromone initial value τgh (1),
 λ ,

h ∈ Q, β > 1


wgh pheromone heuristic factor α, expected heuris-
tic factor β, total number of ants I, pheromone
Through modification, ants can select key nodes volatilization rate ρ, maximum number of itera-
when choosing the next position preferentially. tions N;
(3) Pheromone update 2: While (n < N) do;
In order to calculate the transition probability of the 3: Put ant Ans (ε) on each terminal node s =
next iteration, all ants must update pheromones at the 1, 2, 3, · · · , F;
end of each iteration. The pheromone of edge (g, h) at 4: if (ε < I) then;
the (n + 1)th iteration is shown in (3): 5: if (s < F) then;
s
6: Calculate the transition probability Pg,h (ε, n) by

of
 ∗
formula (2);
(1 − ρ)τgh (n) + ρ∆τgh ,
 (g, h) ∈ tFn 7: T = T ∪ tFs−1 (ε), where tFs−1 (ε) represents the par-
τgh (n+1) = 

(3)

ro
(1 − ρ)τgh (n),
 else tial tree generated by the εth ε ant from the first
s − 1 source node;
where ρ(0 < ρ < 1) is the pheromone volatility, 8: end if
and ∆τgh is the pheromone increment, which is rep-
∗ ∗
resented as 1/|tFn |. Specially, tFn is an optimal tree
-p9: Calculate weight C tF(ε) of tree T ;
10: Correct the actual weight C t′F(ε) = C tF(ε) −
re

generated by the nth iteration, and |tFn | represents the
P
wlµ
∗ (l,µ)∈E ′ ,l∈Q
number of arcs contained in tree tFn . The pseudocode
11: Sort the weights Ct′F(ε) of the trees generated by
lP

of KNACO is shown in Algorithm 1.


ε = 1, 2, 3, · · · I ants, select the minimum Ct′F(ε) as
In Algorithm 1, the optimal data service composi-
min(Ct′F(ε) ), T as min(T );
tion is selected. Firstly, according to the ID of data
12: end if
service nodes, the location information of source data
na

13: Sort the weights min(Ct′F(ε) ) of the trees generated


can be gained from Neo4j. Meanwhile, the relevant
by nth iteration, select the minimum min(Ct′F(ε) ) as
data is extracted depending on the location informa-
Min(Ct′F(ε) ), min(T ) as Min(T );
tion. Secondly, cache services can be gained from
ur

14: Update the pheromone τgh (n + 1) according to for-


MySQL. Finally, the relevant data and cache services
mula (3) for the next iteration;
are combined into a data view, and the view is returned
end while
Jo

15:
to the requester.
16: return Min(Ct′F(ε) ),Min(T ).

6. The HBSAA prototype implementation and ex-


perimental results
Storage
Date name File name
format
In this section, a HBSAA prototype is developed
to evaluate the query performance of PGMAM and Vulnerability data
TXT adAS.txt
KNACO. The experimental data is gathered from (CVE-2019-14911∼
NVD, such as CPE data, CVSS [55] scores, the pub- CVE-2019-14913)
lished vulnerabilities. Furthermore, CVE name, pre- Vulnerability data
conditions and post-conditions about published vul- CSV adAS.csv
(CVE-2019-14914∼
nerabilities are collected from CVE. CVE-2019-15085)
Vulnerability data
HTML adAS.html
6.1. The HBSAA prototype implementation (CVE-2019-15086∼
CVE-2019-15089)
In our prototype, the PGMAM is stored in Neo4j.
CPE-
For example, the related metadata of authentication CPE data XML
adAS.xml
server named adAS is shown in Table 2. As can be
seen from this table, vulnerability data is stored in dif- Device data MySQL adAS
ferent forms, such as TXT, CSV, XML, HTML, and
MySQL. Table. 2: The security threat information of adAS
12 Yuanni Liu, et al.

The PGMAM related to adAS is shown in Fig.7.


It can be seen from this figure that, by one Cypher
statement [56], the metadata of all security-threat in-
formation associated with adAS could be found. Con-
sequently, the PGMAM can increase the query rate of
metadata significantly.
Fig. 8 illustrates the attack template information of
adAS, including introduction, threat components, and
the published vulnerabilities. Specially, the published Fig. 9: The query page of device vulnerability
vulnerabilities consist of CVE name, pre-conditions
and post-conditions. In addition, as can be seen from
ing data from a single data source. In the complex-
Fig. 8, the HBSAA enables data consumers to access
node query case, because data service needs to re-
multi-source heterogeneous information via a unified
trieve data from multiple data sources, the complex-
interface.
node query will involve a lot of nodes in PGMAM.
(1) The query performance of single node
Fig. 10 shows the single-node query time of PG-
MAM and tree model under different data volumes.

of
The results reveal that the query time of our PGMAM
is lower than that of the tree model. In particular, when

ro
the amount of data reaches 10 million, the query time
of PGMAM reduces by about 31% compared with that
-p
of the tree model. The PGMAM model queries single-
node data by property partition index, improving the
speed of query response. Since it makes the data query
re
response time only related to the amount of data in the
partition.
lP

160
150 PGMAM
Fig. 7: The visualization of metadata query results Tree Model
na

140
130
Query time (ms)

120
ur

1 110
6
0 100
90
Jo

80
70
60
50
100 200 300 400 500 600 700 800 900 1000
Fig. 8: The query page of attack template information Amount of data (tens of thousands)
1
6
0

Fig. 9 shows the risk evaluation of adAS, among Fig. 10: Comparison of single-node query time under different data
volumes
them, the CVSS score of each vulnerability in adAS
is displayed by a table. Moreover, according to the
(2) The query performance of complex node
CVSS scoring standard, the risk level for vulnerability
Fig. 11 shows the complex-node query time of PG-
is defined as high, medium, and low. In order to allow
MAM compared with the tree model and MySQL re-
data consumers to understand the potential threat level
spectively, in which, Fig. 11(a) shows the query time
of adAS intuitively, the vulnerability risk distribution
of PGMAM and tree model under different data vol-
of asAS is shown through a statistical chart.
umes. As shown in Fig. 11(a), our PGMAM is su-
perior to the tree model in terms of the query time,
6.2. Performance evaluation
especially when the data volume exceeds 8 million.
6.2.1. The query performance analysis of PGMAM Moreover, when the amount of data reaches 10 mil-
To verify the query performance of PGMAM, we lion, compared with the tree model, the query time
compare the query time of PGMAM, the tree model of PGMAM is reduced by about 21%. For the tree
and MySQL. There are two types of query: the single- model, it is necessary to traverse level by level when
node query and the complex-node query. In the single- querying data, and the response time will increase with
node query case, the data service just requires obtain- the increase of the data volume. Because the meta-
A hierarchical blockchain-enabled security-threat assessment architecture for IoV 13

data of different data sources in the tree model is only The query performance of PGMAM is always superior
connected by fixed nodes, and there is no relationship to that of the tree model, especially when the number
between the metadata. The query time of PGMAM of target nodes exceeds 50.
and MySQL under different data volumes is shown in
Fig. 11(b), which demonstrates that the query time 5000

of complex node in MySQL is much longer than that 4500 PGMAM


MySQL two-
Tree Model
dimensional table
MySQL
of PGMAM. When the volume of data is 10 million, 4000
structure

the response time of our PGMAM declines by around


3500

Query time (ms)


93%, showing the excellent query performance. In
3000
contrast, MySQL needs to establish connections be-
tween multiple storage tables when querying data with 2500

multiple association relationships, which consumes a 2000

lot of time. However, the PGMAM has established 1500


a relational index, which can quickly query complex-
1000
node information based on the relationship between
500
data.
0
10 20 30 40 50 60 70 80 90 100

of
400 Number of Nodes
PGMAM
380
Tree Model
Fig. 12: Comparison of query time under different target nodes

ro
360

340
Query time (ms)

320 6.2.2. The performance analysis of data service


300
280
-p composition optimization strategy based on
KNACO
re
260 (1) Simulation parameter setting of KNACO
In this section, the performance of KNACO will be
240
lP

verified. The experimental operation is as follows:


220
firstly, metadata is extracted from the candidate source
200 data sets. Secondly, according to the dependencies
100 200 300 400 500 600 700 800 900 1000
Amount of data (tens of thousands) among services, a data service composition graph will
na

(a) The PGMAM and tree model be generated by simulation. In general, it takes less
time to access the cache service than to access the
4000
atomic service. To this end, the cost of accessing a
ur

PGMAM
3500 MySQL cache service is set as a random number distributed
between 1 and 10 uniformly. The cost of accessing
Jo

3000 an atomic service is set to be a random number dis-


tributed between 3 and 13 uniformly. To determine
Query time (ms)

2500
the proportion of cache service nodes in a data service
2000 composition graph, three parameters are defined.
1500
• The proportion of cache service nodes: γ =
1000 m∗ /n∗ , where m∗ is the number of cached service
nodes in the data service composition graph, and
500
n∗ is the total number of nodes in the data ser-
0 vice composition graph. Furthermore, in order to
100 200 300 400 500 600 700 800 900 1000
Amount of data (tens of thousands)
ensure the effectiveness of simulation, the consis-
tency of γ value must be guaranteed.
(b) The PGMAM and MySQL
• The average criticality of cache node: λ′ =
Fig. 11: Comparison of complex-node query time under different λi /m∗ , where λi is the criticality of the ith cache
P
data volumes m∗
service node.
The comparison of query time under different tar- • Relative actual weight: refers to the difference
get query nodes of PGMAM, tree model and MySQL, between the actual weight and the average of ac-
as shown in Fig. 12. In particular, the number of tual weights of a Steiner tree, called RAW.
nodes on the horizontal axis represents the amount of
all metadata involved in the query. As can be seen The smaller RAW, the smaller weight of a Steiner
from the figure, the metadata query time in MySQL is tree is generated. Meanwhile, the performance of this
much longer than both PGMAM and the tree model. algorithm is better in respect of selecting the optimal
14 Yuanni Liu, et al.

service composition. Consequently, performance of 7. Conclusions


our KNACO can be verified by comparing RAW.
(2) Simulation results In this paper, HBSAA is proposed to assess the risk
Fig. 13 shows the relationship between RAW and of IoV smart terminals, utilizing multiple Edge Chains
the number of Steiner tree nodes. In particular, the and a Global Chain to share security threat informa-
number of Steiner tree nodes ranges from 20 to 120, tion. In addition, in order to overcome the difference
and the average criticality λ′ is 3. The RAW of a of data storage format and the low query efficiency,
Steiner tree obtained by KNACO is smaller than that DV technology is introduced to manage security threat
of the ant colony optimization. As the number of information, and PGMAM and KNACO based on DV
nodes is growing, the RAW of KNACO and ant colony technology are also designed to enhance the query per-
optimization both increases or decreases within a cer- formance of HBSAA. Furthermore, a HBSAA proto-
tain range, but the RAW of KNACO is always smaller type is built to validate the method proposed in this pa-
than that of the ant colony optimization. per, which also enables users to query data across mul-
tiple heterogeneous data sources, improving the data
1.3 query efficiency. Experimental results demonstrate
KNACO
Ant Colony Optimization
that the PGMAM can process the complex relation-
1.2 ships among security threat information more quickly,
Relative actual weight (RAW)

of
and has better query performance than the tree model
1.1 and MySQL. Moreover, KNACO is better than an ant

ro
1 colony algorithm in selecting the optimal data service
1
6
0
composition. In the future, HBSAA will be applied in
various scenarios such as private networks and mobile
0.9

0.8
-p
crowd-aware networks to assess the risks of various
devices and ensure network security.
re
0.7
20 30 40 50 60 70 80 90 100 110 120
Acknowledgements
lP

Number of Nodes

Fig. 13: The RAW Comparison of Steiner tree when λ′ = 3 This work was supported in part by the Science and
Technology Project Program of Sichuan under Grant
na

The implication of Fig. 14 is similar to that of Fig. 2022YFG0022; in part by the Science and Technol-
13, except that the average criticality λ′ is 5. Fig. ogy Research Program of Chongqing Municipal Edu-
14 also demonstrates that the RAW of a Steiner tree cation Commission under Grant KJZD-K202000602;
ur

calculated by KNACO is always smaller than of the in part by the General Program of Natural Science
ant colony optimization. As the number of nodes in- Foundation of Chongqing under Grant cstc2020jcyj-
creases, the RAW of KNACO and ant colony opti- msxmX1021; in part by the Chongqing Natural Sci-
Jo

mization both increases or decreases within a certain ence Foundation of China under Grant cstc2020jcyj-
range, but the RAW of KNACO is always less than msxmX0343.
that of the ant colony optimization. Furthermore, by
comparing the RAW in Fig. 13 and Fig. 14, it can be
inferred that as the average criticality λ′ increases, the References
performance of KNACO becomes more prominent.
[1] J. Contreras-Castillo, S. Zeadally, J. A. Guerrero-Ibañez, In-
ternet of vehicles: architecture, protocols, and security, IEEE
1.3 internet of things Journal 5 (5) (2017) 3701–3709.
[2] S. Malik, W. Sun, Analysis and simulation of cyber attacks
1.2 against connected and autonomous vehicles, in: 2020 Inter-
Relative actual weight (RAW)

national Conference on Connected and Autonomous Driving


(MetroCAD), IEEE, 2020, pp. 62–70.
1.1 KNACO [3] A. D. Kumar, K. N. R. Chebrolu, S. KP, et al., A brief survey
Ant Colony Optimization
on autonomous vehicle possible attacks, exploits and vulnera-
1
6
1 bilities, arXiv preprint arXiv:1810.04144.
0 [4] K. B. Kelarestaghi, M. Foruhandeh, K. Heaslip, R. Gerdes,
0.9 Vehicle security: Risk assessment in transportation, arXiv
preprint arXiv:1804.07381.
[5] H.-K. Kong, M. K. Hong, T.-S. Kim, Security risk assessment
0.8 framework for smart car using the attack tree analysis, Jour-
nal of Ambient Intelligence and Humanized Computing 9 (3)
0.7 (2018) 531–551.
20 30 40 50 60 70 80 90 100 110 120 [6] S. K. Dwivedi, R. Amin, S. Vollala, R. Chaudhry, Blockchain-
Number of Nodes
based secured event-information sharing protocol in internet
of vehicles for smart cities, Computers & Electrical Engineer-
Fig. 14: The RAW Comparison of Steiner tree when λ′ = 5 ing 86 (2020) 106719.
A hierarchical blockchain-enabled security-threat assessment architecture for IoV 15

[7] K. Qu, L. Meng, Y. Yang, A dynamic replica strategy based Journal 6 (2) (2019) 3775–3784.
on markov model for hadoop distributed file system (hdfs), in: [25] T. Jiang, H. Fang, H. Wang, Blockchain-based internet of
2016 4th International Conference on Cloud Computing and vehicles: Distributed network architecture and performance
Intelligence Systems (CCIS), IEEE, 2016, pp. 337–342. analysis, IEEE Internet of Things Journal 6 (3) (2018) 4640–
[8] L. Mendiboure, M. A. Chalouf, F. Krief, Survey on 4649.
blockchain-based applications in internet of vehicles, Com- [26] J. Pan, J. Wang, A. Hester, I. Alqerm, Y. Liu, Y. Zhao,
puters & Electrical Engineering 84 (2020) 106646. Edgechain: An edge-iot framework and prototype based on
[9] K. Fan, Q. Pan, K. Zhang, Y. Bai, S. Sun, H. Li, Y. Yang, A blockchain and smart contracts, IEEE Internet of Things Jour-
secure and verifiable data sharing scheme based on blockchain nal 6 (3) (2018) 4719–4732.
in vehicular social networks, IEEE Transactions on Vehicular [27] L. Cui, Z. Chen, S. Yang, Z. Ming, Q. Li, Y. Zhou, S. Chen,
Technology 69 (6) (2020) 5826–5835. Q. Lu, A blockchain-based containerized edge computing
[10] A. Sadiq, N. Javaid, O. Samuel, A. Khalid, N. Haider, M. Im- platform for the internet of vehicles, IEEE Internet of Things
ran, Efficient data trading and storage in internet of vehicles Journal 8 (4) (2020) 2395–2408.
using consortium blockchain, in: 2020 International Wireless [28] P. K. Lahiri, D. Das, W. Mansoor, S. Banerjee, P. Chatterjee, A
Communications and Mobile Computing (IWCMC), IEEE, trustworthy blockchain based framework for impregnable iov
2020, pp. 2143–2148. in edge computing, in: 2020 IEEE 17th International Confer-
[11] X.-j. LIU, Y.-d. YIN, W. CHEN, Y.-j. XIA, J.-l. XU, L.-d. ence on Mobile Ad Hoc and Sensor Systems (MASS), IEEE,
HAN, Secure data sharing scheme in internet of vehicles based 2020, pp. 26–31.
on blockchain, Journal of ZheJiang University (Engineering [29] A. Gusenkov, N. Bukharaev, E. Birialtsev, On ontology based
Science) 55 (5) (2021) 957–965. data integration: problems and solutions, in: Journal of
[12] Y. Liu, M. Xiao, S. Chen, F. Bai, J. Pan, D. Zhang, An intel- Physics: Conference Series, IOP Publishing, 2019, p. 012059.

of
ligent edge-chain-enabled access control mechanism for iov, [30] A. K. Akanbi, M. Masinde, Semantic interoperability mid-
IEEE Internet of Things Journal 8 (15) (2021) 12231–12241. dleware architecture for heterogeneous environmental data
[13] X. Liu, X. Yu, X. Ma, H. Kuang, A method to improve the sources, in: 2018 IST-Africa Week Conference (IST-Africa),

ro
fresh data query efficiency of blockchain, in: 2020 12th Inter- IEEE, 2018, pp. 1–10.
national Conference on Measuring Technology and Mecha- [31] D. Alvarez-Coello, J. M. Gómez, Ontology-based integra-
tronics Automation (ICMTMA), IEEE, 2020, pp. 823–827. tion of vehicle-related data, in: 2021 IEEE 15th International
[14] O. V. Sawant, Combating dirty data using data virtualization,
in: 2019 IEEE 5th International Conference for Convergence
in Technology (I2CT), IEEE, 2019, pp. 1–5.
[15] X. Luo, X. Gao, Z. Tan, J. Liu, X. Yang, G. Chen, D2-tree: A
-p Conference on Semantic Computing (ICSC), IEEE, 2021, pp.
437–442.
[32] F. Ekaputra, M. Sabou, E. Serral Asensio, E. Kiesling, S. Biffl,
Ontology-based data integration in multi-disciplinary engi-
re
distributed double-layer namespace tree partition scheme for neering environments: A review, Open Journal of Information
metadata management in large-scale storage systems, in: 2018 Systems 4 (1) (2017) 1–26.
IEEE 38th International Conference on Distributed Comput- [33] E. Martínez, D. M. Toma, S. Jirka, J. Del Río, Middleware for
lP

ing Systems (ICDCS), IEEE, 2018, pp. 110–119. plug and play integration of heterogeneous sensor resources
[16] M. Y. Jung, J. W. Jang, Data management and searching sys- into the sensor web, Sensors 17 (12) (2017) 2923.
tem and method to provide increased security for iot platform, [34] A. Bogdanov, A. Degtyarev, N. Shchegoleva, V. Khvatov,
in: 2017 International conference on information and com- V. Korkhov, Evolving principles of big data virtualization, in:
na

munication technology convergence (ICTC), IEEE, 2017, pp. International Conference on Computational Science and Its
873–878. Applications, Springer, 2020, pp. 67–81.
[17] T. G. Sarath, Centralized server based atm security system [35] A. Aleyasen, M. A. Soliman, L. Antova, F. M. Waas,
with statistical vulnerability prediction capability, in: 2017 M. Winslett, High-throughput adaptive data virtualization via
ur

IEEE International Conference on Consumer Electronics-Asia context-aware query routing, in: 2018 IEEE International
(ICCE-Asia), IEEE, 2017, pp. 61–66. Conference on Big Data (Big Data), IEEE, 2018, pp. 1709–
[18] M. ur Rahman, V. Deep, S. Multhalli, Centralized vulnerabil- 1718.
Jo

ity database for organization specific automated vulnerabili- [36] T. A. Manoj Muniswamaiah, C. Tappert, Data virtualization
ties discovery and supervision, in: 2016 International Confer- for decision making in big data, International Journal of Soft-
ence on Research Advances in Integrated Navigation Systems ware Engineering & Applications 10 (5) (2019) 45–53.
(RAINS), IEEE, 2016, pp. 1–5. [37] M. Gottlieb, M. Shraideh, I. Fuhrmann, M. Böhm, H. Krcmar,
[19] D. Zhang, Y. Liu, L. Dai, A. K. Bashir, A. Nallanathan, Critical success factors for data virtualization: A literature re-
B. Shim, Performance analysis of fd-noma-based decentral- view, The ISC International Journal of Information Security
ized v2x systems, IEEE Transactions on Communications 11 (3) (2019) 131–137.
67 (7) (2019) 5024–5036. [38] Z. Zi-ye, L. Yu-long, H. Bei, Multi-source data integration
[20] M. Walkowski, M. Krakowiak, J. Oko, S. Sujecki, Dis- method based on data virtualization technology, Computer
tributed analysis tool for vulnerability prioritization in cor- and Modernization (11) (2019) 18–22.
porate networks, in: 2020 International Conference on Soft- [39] Y. Hua, X. Liu, Semantic-aware metadata organization for
ware, Telecommunications and Computer Networks (Soft- exact-matching queries, in: Searchable Storage in Cloud
COM), IEEE, 2020, pp. 1–6. Computing, Springer, 2019, pp. 67–97.
[21] R. Kothari, B. Jakheliya, V. Sawant, Implementation of a dis- [40] J. Fan, J. Yan, Y. Ma, L. Wang, Big data integration in remote
tributed p2p storage network, in: 2020 IEEE International sensing across a distributed metadata-based spatial infrastruc-
Conference for Innovation in Technology (INOCON), IEEE, ture, Remote Sensing 10 (1) (2017) 7.
2020, pp. 1–7. [41] M. Khani Dehnoi, S. Araban, Automatic qos-aware web ser-
[22] J. Wu, M. Dong, K. Ota, J. Li, Z. Guan, Big data analysis- vices composition based on set-cover problem, International
based secure cluster management for optimized control plane Journal of Nonlinear Analysis and Applications 12 (1) (2021)
in software-defined networks, IEEE Transactions on Network 87–109.
and Service Management 15 (1) (2018) 27–38. [42] Y. Li, J. Hu, Z. Wu, C. Liu, F. Peng, Y. Zhang, Research on
[23] Z. Ma, J. Zhang, Y. Guo, Y. Liu, X. Liu, W. He, An ef- qos service composition based on coevolutionary genetic al-
ficient decentralized key management mechanism for vanet gorithm, Soft Computing 22 (23) (2018) 7865–7874.
with blockchain, IEEE Transactions on Vehicular Technology [43] H. Elmaghraoui, L. Benhlima, D. Chiadmi, Dynamic web ser-
69 (6) (2020) 5836–5849. vice composition using and/or directed graph, in: 2017 3rd
[24] Y. Yao, X. Chang, J. Mišić, V. B. Mišić, L. Li, Bla: International Conference of Cloud Computing Technologies
Blockchain-assisted lightweight anonymous authentication and Applications (CloudTech), IEEE, 2017, pp. 1–8.
for distributed vehicular fog services, IEEE Internet of Things [44] C. Wang, X. Zhang, D. Chu, Research on service composi-
16 Yuanni Liu, et al.

tion optimization method based on composite services qos, in:


2020 5th International Conference on Computational Intelli-
gence and Applications (ICCIA), IEEE, 2020, pp. 206–210.
[45] Z. Wang, B. Cheng, W. Zhang, J. Chen, Q-graphplan: Qos-
aware automatic service composition with the extended plan-
ning graph, IEEE Access 8 (2020) 8314–8323.
[46] M. W. Wiśniewski, The classification of vulnerabilities of iot
devices based on cve database contents, Ph.D. thesis, Instytut
Informatyki (2020).
[47] N. Mishra, R. Singh, S. K. Yadav, Analysis and vulnerability
assessment of various models and frameworks in cloud com-
puting, in: Advances in Data Sciences, Security and Applica-
tions, Springer, 2020, pp. 407–417.
[48] P. K. Singh, R. Singh, S. K. Nandi, K. Z. Ghafoor, D. B.
Rawat, S. Nandi, Blockchain-based adaptive trust manage-
ment in internet of vehicles using smart contract, IEEE Trans-
actions on Intelligent Transportation Systems 22 (6) (2020)
3616–3630.
[49] H. Liu, G. Jiang, L. Su, Y. Cao, F. Diao, L. Mi, Construction
of power projects knowledge graph based on graph database
neo4j, in: 2020 International Conference on Computer, Infor-

of
mation and Telecommunication Systems (CITS), IEEE, 2020,
pp. 1–4.
[50] M. Castro, B. Liskov, et al., Practical byzantine fault toler-

ro
ance, in: Proceedings of the third symposium on Operating
systems design and implementation, USENIX Association,
1999, pp. 173–186.
[51] S. Kato, Y. Inagaki, M. Aoyama, A structural analysis method
of oss development community evolution based on a semantic
graph model, in: 2018 IEEE 42nd Annual Computer Software
and Applications Conference (COMPSAC), IEEE, 2018, pp.
-p
re
292–297.
[52] E. Neshati, A. A. P. Kazem, Qos-based cloud manufacturing
service composition using ant colony optimization algorithm,
lP

International Journal of Advanced Computer Science and Ap-


plications 9 (2018) 437–440.
[53] H. Zhang, D.-Y. Ye, W.-Z. Guo, A steiner point candidate-
based heuristic framework for the steiner tree problem in
na

graphs, Journal of Algorithms & Computational Technology


10 (2) (2016) 99–114.
[54] R.-H. Li, L. Qin, J. X. Yu, R. Mao, Efficient and progressive
group steiner tree search, in: Proceedings of the 2016 Interna-
ur

tional Conference on Management of Data, ACM, 2016, pp.


91–106.
[55] B. L. Bullough, A. K. Yanchenko, C. L. Smith, J. R. Zip-
Jo

kin, Predicting exploitation of disclosed software vulnerabili-


ties using open-source data, in: Proceedings of the 3rd ACM
on International Workshop on Security and Privacy Analytics,
ACM, 2017, pp. 45–53.
[56] N. Francis, A. Green, P. Guagliardo, L. Libkin, T. Lindaaker,
V. Marsault, S. Plantikow, M. Rydberg, P. Selmer, A. Taylor,
Cypher: An evolving query language for property graphs, in:
Proceedings of the 2018 International Conference on Manage-
ment of Data, ACM, 2018, pp. 1433–1445.
Declaration of interests

☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.

☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:

f
r oo
-p
re
lP
na
ur
Jo

You might also like