Professional Documents
Culture Documents
1 s2.0 S2352864822002875 Main
1 s2.0 S2352864822002875 Main
PII: S2352-8648(22)00287-5
DOI: https://doi.org/10.1016/j.dcan.2022.12.019
Reference: DCAN 592
Please cite this article as: Y. Liu, L. Pan, S. Chen, A hierarchical blockchain-enabled security-
threat assessment architecture for IoV, Digital Communications and Networks (2023), doi: https://
doi.org/10.1016/j.dcan.2022.12.019.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.
© 2022 Chongqing University of Posts and Telecommunications. Production and hosting by Elsevier
B.V. on behalf of KeAi Communications Co. Ltd.
Digital Communications and Networks(DCN)
of
b State Key Laboratory of Wireless Mobile Communications, China Academy of Telecommunication Technology,
ro
Abstract -p
In Internet of Vehicles (IoV), the security-threat information of various traffic elements can be exploited by hackers to
re
attack vehicles, resulting in accidents, privacy leakage. Consequently, it is necessary to establish security-threat assessment
architectures to evaluate risks of traffic elements by managing and sharing security-threat information. Unfortunately, most
assessment architectures process data in a centralized manner, causing delays in query services. To address this issue, in this
lP
paper, a Hierarchical Blockchain-enabled Security threat Assessment Architecture (HBSAA) is proposed, utilizing edge chains
and global chains to share data. In addition, data virtualization technology is introduced to manage multi-source heterogeneous
data, and a metadata association model based on attribute graph is designed to deal with complex data relationships. In order
na
to provide high-speed query service, the ant colony optimization of key nodes is designed, and the HBSAA prototype is also
developed and the performance is tested. Experimental results on the large-scale vulnerabilities data gathered from NVD
demonstrate that the HBSAA not only shields data heterogeneity, but also reduces service response time.
ur
Internet of Vehicles, Blockchain, Edge computing, Data virtualization, Data service composition
may cause the security-threat assessment system to can provide a system with traceability, validity, and
crash, and fail to respond timely, leading to the in- the automatic execution of policy. Therefore, we try
ability to evaluate risks of IoV elements. Secondly, a to leverage EC and blockchain technology in the IoV
lot of security-threat information requests may gener- data sharing area, and introduce the Edge-Chain into
ate massive communication overhead, and increase the IoV framework. The goal is to construct a Hierar-
service response time, and lead to inefficient informa- chical Blockchain-enabled Security threat Assessment
tion sharing. In order to solve shortcomings of central- Architecture (HBSAA) to share security threat infor-
ized systems, a distributed security-threat assessment mation among all RSUs in IoV on the basis of sloving
system is needed. Specifically, the distributed system the single point of failure and communication delay.
utilizes a replica mechanism to persistently store the Nevertheless, in the design of a HBSAA, a diffi-
same security threat information on different nodes, cult problem is the multi-source heterogeneous data
avoiding single point of failures [7]. Nevertheless, it is management. The content, format and quality of
difficult to guarantee the data consistency of each node multi-source heterogeneous data vary greatly, which
in distributed system with the increase of data volume. increases the difficulty of data collection. Moreover,
In addition, the replica mechanism generates a lot of as a common database of blockchain, LevelDB [13]
redundant data, which will occupy storage space and exploits a structured format to store data, so that the
cause waste of system resources. Therefore, the cur- heterogeneous data must be transformed into struc-
of
rent security-threat assessment systems are not suit- tured data in data storing, but data loss may occur dur-
able for IoV with dynamic topology and scarce net- ing conversion. Fortunately, Data Virtualization (DV)
work resources. technology [14] shows advantages towards issues of
ro
To deal with this problem, many studies have intro- multi-source heterogeneous data management. The
duce blockchain into the data sharing model of IoV. DV technology provides users with a unified method
Blockchain is a distributed framework, in which the
consensus mechanism can ensure the consistency and
-p
to query and manipulate data, regardless of the data
location and format. In addition, a DV-based system
re
security of shared data [8]. For example, Fan et al. [9] only stores metadata instead of the entire source data,
proposed a blockchain-based data sharing model for to improve the speed of data processing effectively.
the IoV, using vehicles with limited computing power Unfortunately, we found that applying DV technology
lP
as full nodes to build a blockchain, but the cost is too to directly manage multi-source heterogeneous data is
high. To this end, RSUs with computing power much not an easy task. One major challenge is how to im-
larger than vehicles have been used in data sharing for prove the metadata query speed. Because of the so-
na
IoV. As the full node of the blockchain, all RSUs share phisticated relationships among security threat infor-
the same distributed ledger [10, 11], which has better mation, the traditional DV technology adopts a meta-
computing and storage capabilites. However, a large data association model [15], which must be traversed
ur
number of nodes coexist in the blockchain network, while querying data, leading to low query rate. An-
and a large number of devices are connected to the other key challenge is to improve the query efficiency
network and request corresponding network services, of data services. A service query request may consist
Jo
which may easily cause problems such as slow mes- of multiple data services, with a complex query pro-
sage processing speed, data transmission delay, and cess and poor query performance.
high bandwidth occupation. Edge Computing (EC) In this context, we ask the following question: In
can solve this problem in IoV. In our previous re- order to evaluate the potential risks of IoV devices
search, we have integrated blockchain and smart con- quickly, can we exploit existing DV technology to in-
tracts capabilities into the intelligent EC framework tegrate security threat information, based on the pre-
as the foundations [12]. The goal is to establish an viously proposed Edge-Chain IoV framework? We
intelligent and blockchain-embedded Edge-Chain en- answer the question optimistically by presenting a
abled framework to provide access control service for HBSAA. Specially, a Global-Chain is added to our
IoV devices. In our Edge-Chain IoV framework, EC HBSAA, which is deployed in the cloud data cen-
offloads the computing of access control service to ter, to provide efficient and speedy security assess-
RSUs, avoiding single point of failures and reducing ment services of smart terminals for our Edge-Chain
communication latency. Consequently, our previous IoV framework. Moreover, our HBSAA leverages
work can resolve problems existing in the security- DV technology to shield the heterogeneity of security-
threat assessment systems. Nevertheless, due to each threat information. In this paper, a Property Graph-
RSU node in different areas cannot share the behav- based Metadata Association Model (PGMAM) is also
ior data of vehicles, it needs to collect the behavior designed to deal with complex logical relationships
data repeatedly, which leads to a waste of network re- between data, which can improve the query perfor-
sources. Therefore, on this basis, the indroduction of mance of metadata. In addition, for the low service
blockchain to store and share data can slove the prob- query efficiency, we present an Ant Colony Optimiza-
lem of data sharing between cross-regional RSUs. Ad- tion of Key Nodes (KNACO). The contributions of our
ditionally, combined with smart contracts, blockchain work are summarized as follows:
A hierarchical blockchain-enabled security-threat assessment architecture for IoV 3
(1) A HBSAA is introduced to provide efficient se- other fields are worth learning [16]. Most of cur-
curity threat assessment services for our Edge-Chain rent security-threat assessment systems are based on
IoV framework. In HBSAA, the blockchain architec- the centralized architecture and distributed architec-
ture consists of multiple Edge-Chains and a Global- ture. In a centralized architecture, data management
Chain. The Edge-Chain is composed of several RSUs is performed in a single central module. For instance,
within a certain range, sharing security threat infor- Sarath et.al [17] have proposed an architecture to de-
mation between RSUs in the area. The Global-Chain velop an Automated Teller Machine (ATM) security
consists of multiple cloud data centers, sharing secu- system, which utilizes a wide variety of sensors to cap-
rity threat informatin for RSUs in different regions. In ture security related information, and sends to a central
addition, RSUs act an authority of security assessment server for further process, analysis and data presenta-
to evaluate potential risks of IoV devices based on se- tion. Rahman et. al [18] have constructed a central-
curity threat information. Therefore, this architecture ized vulnerability database, which provides a platform
can slove the storage pressure and computing pressure for gathering, storing, and conveying vulnerability in-
on a single node, and avoid the single point of fail- formation, making it easy for system administrators to
ure and high communication overhead. Additionally, access and analyze. However, like most centralized
it can also improve the scalability of the system. architectures, these solutions inherit the disadvantage
(2) In HBSAA, DV technology is introduced, which of single-point failures [19]. In a distributed architec-
of
exploits metadata as indexes to integrate multi-source ture, data management is often assigned to multiple
heterogeneous data, so that they may be accessed nodes to complete. For example, [20] have presented
without regard to their physical storage or heteroge- a distributed analysis system for vulnerability priori-
ro
neous structure. In addition, for the problem of low tization, which operates in the scalable, containerized
query speed caused by complex relationships among environment. The system integrates the collected vul-
data, PGMAM is proposed to build a property graph
model of metadata, based on relationships among var-
-p
nerability data with data from organizational inventory
database to detect the scan results in real time. Nev-
re
ious security threat information. In order to improve ertheless, the current distributed architecture often ex-
the service query rate, KNACO is presented to con- ploits a replica mechanism, which will utilize redun-
vert the problem of data service composition into a dant storage space to store data, causing a waste of re-
lP
Steiner tree problem. The KNACO selects the smallest sources [21]. In addition, for distributed architectures,
Steiner tree by calculating the relative actual weight of it is a major challenge to maintain data consistency
the tree, and the minimum Steiner tree is the optimal across nodes [22].
na
time of querying metadata under different data vol- in IoV, Jiang et al. [25] have proposed a blockchain
umes is compared utilizing the PGMAM, MySQL and architecture that describes how nodes in IoV could
tree model. Experimental results illustrate that the PG- participate in blockchain. They also have analyzed
Jo
MAM has more remarkable query performance as in- the data transmission performance. Recently, Edge-
creasing volume of data. Moreover, to verify the ef- Chain, as an extension of blockchain, has attracted
ficiency of KNACO, in the case of different number more attention [26], which can push more computing
of service composition nodes, The actual weights of and storage resources to the edge, reducing commu-
the minimum Steiner tree generated by KNACO are nication delays. For example, Cui et al. [27] have
compared with the basic ant colony algorithm, and it presented a CUTE computation platform, which pro-
is observed that KNACO can better select the optimal vides low-latency computation services for IoV by us-
service combination. ing Edge-Chain. Lahiri et al. [28] have proposed a
The rest of this paper is organized as follows. In framework based on blockchain and smart contracts
Section 2, the related work is discussed. In Section 3, to store messages and effectively validate trustworthi-
the HBSAA is described. In Section 4, the PGMAM ness. Moreover, this framework treats messages as
is described. In Section 5, the optimization strategy transactions and validates the trustfulness of messages
of data service composition is described, the KNACO by using edge computing based on geographical lo-
is formulated and the related algorithm is explained. cations and Internet of Things (IoT) implementation.
In Section 6, the experimental results of the HBSAA Despite the application of Edge-Chain in IoV archi-
prototype is presented and analyzed. In Section 7, the tecture, to the best of our knowledge, few people have
paper is finally concluded. considered the problem of multi-source heterogeneous
data management in this architecture.
Traditional multi-source heterogeneous data man-
2. Related works agement methods include ontology-based method [29]
and middleware method [30]. For instance, Alvarez-
Because of the lack of research on security-threat Coello et al. [31] have presented an ontology-based
assessment architecture for IoV, many researches in
4 Yuanni Liu, et al.
approach to integrate vehicle-related data, which con- cess of security threat information. Finally, the design
structs an ontology to define semantic of vehicle data, issues of HBSAA will be explained.
so as to realize the query of vehicle data across data
sources. However, these methods are inefficient in 3.1. The overview of HBSAA in edge-chain IoV frame-
data processing, and they also have data errors dur- work
ing the format conversion [32, 33]. DV technology
is the process of aggregating data from different in- In order to realize the cross-origin sharing of secu-
formation sources to develop a single, logical and vir- rity threat information in Edge-Chain IoV framework,
tual view of information so that users or applications the HBSAA is presented in this paper, as shown in
can access it without knowing the exact storage loca- Fig. 1. The HBSAA consists of three layers, which
tion and format of the data, and can provide a new are perception layer, edge layer, and cloud data cen-
reference for the management of heterogeneous data ter. The perception layer comprises various smart ter-
from multiple sources [34]. Aleyasen et al. [35] have minals, such as vehicles, cameras, and signal lights,
leveraged Hyper-Q’s adaptive DV platform to inte- etc. To be specific, smart terminals collect the basic in-
grate data, which can run on various databases trans- formation by their equipped sensors, including device
parently and solve the risk of data migration. Mu- information, operating system information, etc., and
niswamaiah et al. [36] have outlined DV technology then the collected information is uploaded to the edge
of
used for Business Intelligence (BI), which can solve layer. The edge layer is composed of numerous RSUs,
BI problems that needs to collect and analyze business and each RSU leverages a local data management sys-
data in real time to make better decisions. However,
ro
tem to evaluate several risks for smart terminals, e.g.,
the DV technology has two problems to be solved, that equipment risk assessment, operating system risk as-
is, metadata management and efficient data query ser- sessment. Moreover, the edge layer contains several
vice [37].
In DV technology, multi-source heterogeneous data
-p
Edge-Chains in accordance with the variety of loca-
tions and communication ranges of RSUs. The cloud
re
is managed by metadata [38], which is the index of data center is formed by different cloud servers, which
source data. Therefore, effective metadata manage- utilizes a global-data management system to supply
ment is the key to ensuring the performance of data security threat information for all RSUs in IoV. This
lP
query, storage, and operation. At present, there are means that IoV can share security-threat information
many researches to improve the query speed of meta- among RSUs in different regions through the cloud
data [36, 39]. To improve the efficiency of massive data center, and avoid duplicate data collection by
na
data retrieval, Fan et al. [40] have proposed a data or- RSUs, thereby improving the efficiency of security
ganization approach, which takes the logical segmen- threat assessment. Meanwhile, the cloud data cen-
tation indexing code as the identifier of each remote ter integrates one Global-Chain, which enables cloud
ur
sensing data. Moreover, in order to facilitate users to servers to be deployed in a distributed way.
query data quickly, the DV technology provides users
with data query services through a single data access
Jo
RSU2 RSUR-1
SQL Interfaces
Cache Submodule
(Execution)
RSU3 Vulnerability/attack
RSUR-2 Smart Contract
Vehicle-to-Vehicle (V2V ) Template Information Query Interfaces
Local Data Service
communication Processing Submodule (A Cache Optimization
Vehicle-to-InfraStructure (V2I ) Device Vulnerability Query Strategy)
communication Analysis
Vehicle-to-RSU (V2R ) Local Metadata Storage
communication Submodule TXs
Infrastructure-to-RSU (I2R ) (mining,consensus)
Source Data Module
communication
RSU-to-RSU (R2R ) Intelligent Terminal Official Statistics Local Metadata
Local Metadata
communication Processing Submodule
RSU-to-Cloud (R2C ) Device Data CPE Data Vulnerability Data
of
communication
Cloud-to-Cloud (C2C ) Local Metadata
communication Extraction Submodule
SQL Interfaces File URI
Smart Contracts Local Metadata
Organization Module
Trade Information
ro
δaεService Framework of Security Threat for IoV δbεLocal Data Management System
CSV scription, latest revision next time, data views can be obtained directly from
system data
time the Edge-Chain, enhancing the query performance of
security threat information.
ID, name, category, ver-
Protocol
ur
tency of each node, consistent with our previous work (1) RSU will collect original data from various data
[12]. sources according to the basic information submitted
The smart contract supports automatic execution of by smart terminals. After the metadata is extracted and
cache optimization query strategy. Firstly, the local processed by the local metadata organization mod-
data service process submodule finds source data sets ule, the metadata is stored in the local record pool in
based on metadata. Secondly, according to the depen- chronological order. When the information stored in
dencies among services, this submodule constructs the the record pool is enough to fill the entire block, the
source data sets into a connected directed graph, that system packs the data into blocks.
is, a data service composition graph. In addition, the (2) In order to get the most out of the blockchain
generation of a service composite graph is the trigger network. The metadata of the security threat informa-
condition of smart contracts. Consequently, when the tion will be stored in the block, while the complete
local data service process submodule generates a ser- raw data will be stored in the RSU to ensure the trace-
vice composite graph, it will trigger smart contracts ability and tamper-resistance of the information. The
embedded with cache optimization query strategy to block header contains the hash value and timestamp
select the optimal service composition. Furthermore, of the previous block, while the hash of the block
the local data service process submodule extracts the body is obtained from the entire data. Additionally,
source data based on the optimal service composition, the information in the block header can be used for
of
improving the query efficiency. data queries. After the block is constructed, the mas-
ter node broadcasts it to the whole network. The sys-
tem then waits for the implementation of the consen-
ro
3.1.2. The global data management system
As shown in Fig. 1(d), the global data management sus process.
system includes a global data process component and (3) PBFT [50] algorithm has low consensus de-
a copy of the entire Global-Chain. This system is used
to manage all metadata of security threat information
-p
lay and high efficiency, which can meet the needs of
real-time processing and large-scale communication.
re
to share information among RSUs in various regions. Therefore, the PBFT is adopted for block consensus.
(1) The global data process component The specific consensus process is as follows: Firstly,
The global data process component stores security the master node broadcasts the data block to the whole
lP
threat information and processes data requests from network. Furthermore, the consensus node verifies the
RSUs, which is composed of a global metadata organi- validity of the block and broadcasts its verification re-
zation module and a global data service module. Par- sults and signatures to each other in a distributed man-
na
ticularly, the global metadata organization module is ner. Secondly, after each node receives the verification
responsible for collecting and processing metadata in result, it compares it with other nodes and feeds it back
the network. Specifically, the global metadata organi- to the master node, including the verification result,
ur
zation receives metadata uploaded by RSUs in each re- comparison result, signature and received verification
gion, and it also cleans duplicate metadata, and stores result. Finally, if two-thirds of the nodes agree on
the data block, the block is stored on the blockchain.
Jo
Otherwise, the RSU sends data requests to the cloud information in the local region, it will send a query re-
data center. The cloud data center searches the re- quest to the cloud data center. Furthermore, the search
quired metadata from the global data management sys- process of security threat information in the cloud data
tem. Assuming that metadata exists, according to the center is the same as the above process.
metadata, the cloud data center finds the RSU that (7) If neither the local data management system
stores the source data. Moreover, the cloud data cen- nor the global data management system stores security
ter obtains security threat information from the stored threat information of smart terminal, RSU will collect
RSU, and returns the information to the requested security threat information by the local data manage-
RSU. On the contrary, the RSU will acquire security ment system to evaluate risks of smart terminal.
threat information of the smart terminal if the cloud
data center does not find out metadata. The detailed 3.3. Design issues
query process is shown in Fig.2, as follows: When utilizing DV technology to manage multi-
source heterogeneous data in HBSAA, there are two
Start
basic factors that affect the performance of data query,
Input Request which will be discussed in the following.
Parse the request in the local
data proscessing submodule
Find metadata from the local
metadata storage submodule Find the candidate source
Data source 1
of
dataset in the local RSU
No Yes
Is there a service Whether the
view in the local data cache Trigger the smart contract
corresponding metadata can
submodule be found
Generate optimal service
...
ro
composition, and extract data
Yes
Parse the request in the global
data proscessing submodule
Is there a service
view in the global data cache
Find the corresponding
source data in other RSU
-p ... ...
re
submodule Trigger the smart contract
be found
Combine into a data view
No and cache the view
...
Return the view to the
requester
Collect requested information
of security threat
... ...
na
End
(1) The local data service process submodule parses Target service
the search request, which contains the basic informa- E F
Jo
if the relationship is used as a query index, only the Therefore, the properties can be represented as A s =
storage nodes that meet the index requirements will {a s1 , a s2 , · · · a si }.
be traversed to find the requested data, which can sig- (3) The Relationship definition
nificantly improve the data query rate. In summary, in In the PGMAM, a relationship between different
order to promote the query performance of HBSAA, it data is described by an edge. Consequently, all re-
is of great significance to find a new metadata storage lationships can be defined as a set of edges E : E =
method to establish relationships among data. {eux , eati t }, where eux represents relationship between
metadata u and metadata x (u, x ∈ M, u , x), and
3.3.2. The impact of data service composition opti- eati t represents relationship between metadata t and the
mization on query performance property ati . The relationships among different data
A query service may consist of multiple target sub- are defined as eDO , eS R , eaS i S , eRC , eaRi R , eaCi C , which
services, hence how to choose target sub-services to are denoted as follows:
form the optimal service composition is another im- eDO : represents an inclusion relationship between
portant factor that affects the query performance of metadata D of different source data sets and metadata
HBSAA. As shown in Fig. 4, service E is composed O. In particular, O is one of device metadata, CPE
of target sub-services A, B, and C. In the subsequent metadata, and vulnerability metadata. For example,
query of service F, even if some target sub-services eDS is an inclusion relationship between metadata D
of
are the same as the target sub-services in service E, and device metadata S .
service composition must be reconstructed, which will eS R : represents a mapping relationship between de-
result in a lower query rate of HBSAA. In conclusion, vice metadata S and CPE metadata R.
ro
to implement query optimization, it is vital to design eaS i S : represents a relationship between device
a service composition optimization strategy based on metadata S and property aS i .
cache services. -p eRC : represents an inclusion relationship between
CPE metadata R and vulnerability metadata C.
re
4. A metadata association model based on prop- eaRi R : represents a relationship between CPE meta-
erty graph for security threat information data R and property aRi .
eaCi C : represents a relationship between vulnerabil-
lP
In this section, aiming at the effect of metadata stor- ity metadata C and property aCi .
age on query rate, which described in the section 3, the
proposed method PGMAM will be introduced. 4.2. The PGMAM construction
na
eaCi C
CVSS
eaCi C eaCi C
In the PGMAM, the metadata of security threat in- CVSS
Pos-
condition eaCi C
CVE 2
CVE 1
eRC
Software
Name
CVE 2 eaCi C Pre-
condition
eaRi R Software
eRC eRC
formation is depicted by nodes. Consequently, the Pre-
condition eaCi C
Description
eaCi C
CPE 1
Brand
eaRi R
eRC
Name
eaCi C
Description
Pos-
Device ID CPE 1 CVE 1 eaCi C
different metadata can be defined as a set of nodes Pos-
condition
eaCi C
Software
Name
eaRi R
CPE 2
ea Si S eSR ea Si S
eSR
Device ID eaCi C eaCi C
condition
CVSS
eaCi C
Software
Name
Device ID ea Si S
Servers eDS IoV
eDS
eDS
eSR
CPE 1 eaCi C
eRC eaCi C Description
eaRi R
D: represents metadata sets of various security Description
eaCi C
eRC
CPE 1
eSR
Camera
eDS eDS
GPS
ea Si S
ea Si S
Device ID
CVE 1
eaCi C Pos-
ea Si S Bluetooth
Software
Brand
eSR ea Si S ea Si S
Audio
ea Si S
eSR Brand
Pre-
condition
eaCi C condition
eRC
CPE 1 Device ID Brand
Device ID
CPE 1
eRC
CVE 2
CVE 2
eaCi C
eRC eRC
cluding the storage location, type, and so on. Description eaCi C
eaCi C ea C
Ci eaCi C CVE 1
Pre-
condition
eaCi C
CVE 1
eaRi R
eaCi C
eaCi C
Description
Software
Description eaCi C
S : represents a piece of device metadata. Pre-
condition Pos-
condition
eaCi C eaCi C eaCi C
CVSS
eaCi C eaCi C
Name
CVSS
Pos-
condition
is CPE metadata associated with various device meta- 5. Data service composition optimization strategy
data, and the property of R contains software name. based on KNACO
C is vulnerability metadata, which is included in each
CPE metadata, and the properties of C have CVSS, In this section, the solution for the service compo-
pre-condition, pos-condition, description, etc. In ad- sition optimization issue is discussed, which proposed
dition, device metadata S and vulnerability metadata in the section 3.
C are mapped through CPE metadata R. Furthermore,
the implication of edges is the same as above. 5.1. Related definitions
In practice, the graph database Neo4j is deployed to Data service is an encapsulation of data resources,
store the PGMAM. Specifically, each node and edge in which can prevent users from accessing physical data
PGMAM correspond to the node and edge in Neo4j, sources directly. In this paper, a data service has mul-
and the metadata property is stored in the form of key- tiple attributes, which are defined as follows.
value pairs. The detailed metadata storage flow is as (1) AS : represents complete and indivisible data
follows: service, called atomic data service. This paper de-
(1) An input metadata set is defined as T = fines atomic data service as a four-tuple: AS =<
{t1 , t2 , · · · , tZ } (Z ≥ 1 and is an integer), and each id, label, properties, relationships > , where id repre-
metadata tq (1 ≤ q ≤ Z) corresponds to property sents the identifier of AS , and AS id are different data
of
set Atq = {Atq1 , Atq2 , · · · , Atqn } (n ≥ 1 and is an in- services; The label represents the atomic data service
teger). Moreover, metadata tq has an edge set Etq = type; The properties denote sets of property contained
{etq jb , etq atqn }( jb ∈ M).
ro
in the atomic data service; The relationships repre-
(2) Supposing that metadata tq is determined to exist sent an inclusion relationship among data service and
in Neo4j, the data storage will be terminated. Other- atomic data services.
wise, metadata tq and property set Atq will be stored in
Neo4j.
-p (2) CS : represents the data service that cached
after data consumers access, named cache ser-
re
(3) To build a relationship between metadata tq and vice. In this study, a cache service is re-
metadata jb (1 ≤ b ≤ S ), it is necessary to determine garded as a new data service, hence a cache ser-
whether metadata jb (1 ≤ b ≤ S ) exists. If metadata vice can be represented as a four-tuple: CS =<
lP
jb exists, the edge etq jb between metadata tq and meta- id, label, properties, relationships >. Furthermore,
data jb is stored in Neo4j. Otherwise, the storage of the CS has the same meaning as AS .
relationship is terminated. (3) DS : represents a relationship between two data
na
(4) To establish a relationship between metadata tq services. Specifically, there are two relationships in
and property atqn , it is significant to determine whether this paper. One is a parallel relationship, which is the
property atqn exists. Supposing that property atqn ex- link between two or more atomic services. The other
ur
ists, the edge etq atqn between metadata tq and property is an inclusion relationship, which means that one data
atqn will be stored in Neo4j. Otherwise, the property service is a subset of another data service.
atqn is created, and then edge etq atqn is constructed.
Jo
of
of ants. The more critical node is, and the more ants
V = {v1 , v2 , · · · , vm } (m ≥ 0 and is an integer) is
will pass by. In this study, the criticality of a node
a node set in G, and nodes can be divided into three
ro
is determined by the number of AS , which have an
categories:
inclusion relationship with CS .
(1) The starting point corresponds to an input re-
(2) The transition probability
quest.
(2) The end point is a target data service.
(3) The remaining nodes represent CS or AS .
-p Target AS nodes are regarded as leaf nodes, which
is denoted as v s (s= 1,2,...,n), and k ants are placed on
re
each leaf node. Meanwhile, the input request node is
E ′ = {e′1 , e′2 , · · · , e′m } is an edge set in graph G, and considered as a root node, which is denoted as v∗ . Dur-
edges can be divided into two types: ing each iteration, each ant generates a branch from v s
lP
(1) An edge begins with a starting node and ends at to v∗ . Therefore, k trees will be formed after one itera-
any data services, which represents a data service that tion, and ants will select the minimum-weight tree in k
can be accessed directly according to an input request. trees. Moreover, pheromones are updated after one it-
na
(2) An edge represents the relationship between eration, and then the next iteration is initiate, until the
data services. preset number of iterations is reached.
W = {w1 , w2 , · · · , wm } is a weight set in G , and During the nth iteration, the current position of the
ur
W represents the cost of accessing a data service. In ε ant Ans (ε) is g from the S th target atomic data ser-
th
particular, when calculating the access cost of cache vice node, the transition probability to the next posi-
services, the access cost of atomic services does not
Jo
tion 0,
else
To select data service composition with the lowest (1)
access costs, we convert the issue of choosing an opti- The variables are defined as follows:
mal data service composition to a problem of finding τgh (n): the pheromone of edge (g, h) at the nth iter-
the smallest Steiner tree [54]. Furthermore, the small- ation, and ηgh is the attraction of the node h to node
est Steiner tree meets the following conditions: g.
b sp (ε): the branch through which ant Ans (ε) passes
• A target atomic service means all atomic services and Bg represents the set of candidate nodes of g.
that have a direct parallel relationship with the α: the correlation between the transition probability
target data service. A Steiner tree is a directed and τgh (n).
tree, which takes the input request node as a root β: the correlation between the transition probability
node, and the target atomic service node as a leaf and ηgh .
node. The values of α and β are constants, which are de-
fined during algorithm initialization.
• The actual weight of edges contained in the Normally, ηgh = w1gh , where wgh is the weight
Steiner tree is the lowest. Specifically, if an edge of edge (g, h). Specially, this paper mainly cor-
contains data service nodes, only the weight of rects the transition probability by modifying ηgh to
edge in parallel with nodes is calculated. preferentially-selected key nodes. Moreover, the set
A hierarchical blockchain-enabled security-threat assessment architecture for IoV 11
of
∗
formula (2);
(1 − ρ)τgh (n) + ρ∆τgh ,
(g, h) ∈ tFn 7: T = T ∪ tFs−1 (ε), where tFs−1 (ε) represents the par-
τgh (n+1) =
(3)
ro
(1 − ρ)τgh (n),
else tial tree generated by the εth ε ant from the first
s − 1 source node;
where ρ(0 < ρ < 1) is the pheromone volatility, 8: end if
and ∆τgh is the pheromone increment, which is rep-
∗ ∗
resented as 1/|tFn |. Specially, tFn is an optimal tree
-p9: Calculate weight C tF(ε) of tree T ;
10: Correct the actual weight C t′F(ε) = C tF(ε) −
re
∗
generated by the nth iteration, and |tFn | represents the
P
wlµ
∗ (l,µ)∈E ′ ,l∈Q
number of arcs contained in tree tFn . The pseudocode
11: Sort the weights Ct′F(ε) of the trees generated by
lP
15:
to the requester.
16: return Min(Ct′F(ε) ),Min(T ).
of
The results reveal that the query time of our PGMAM
is lower than that of the tree model. In particular, when
ro
the amount of data reaches 10 million, the query time
of PGMAM reduces by about 31% compared with that
-p
of the tree model. The PGMAM model queries single-
node data by property partition index, improving the
speed of query response. Since it makes the data query
re
response time only related to the amount of data in the
partition.
lP
160
150 PGMAM
Fig. 7: The visualization of metadata query results Tree Model
na
140
130
Query time (ms)
120
ur
1 110
6
0 100
90
Jo
80
70
60
50
100 200 300 400 500 600 700 800 900 1000
Fig. 8: The query page of attack template information Amount of data (tens of thousands)
1
6
0
Fig. 9 shows the risk evaluation of adAS, among Fig. 10: Comparison of single-node query time under different data
volumes
them, the CVSS score of each vulnerability in adAS
is displayed by a table. Moreover, according to the
(2) The query performance of complex node
CVSS scoring standard, the risk level for vulnerability
Fig. 11 shows the complex-node query time of PG-
is defined as high, medium, and low. In order to allow
MAM compared with the tree model and MySQL re-
data consumers to understand the potential threat level
spectively, in which, Fig. 11(a) shows the query time
of adAS intuitively, the vulnerability risk distribution
of PGMAM and tree model under different data vol-
of asAS is shown through a statistical chart.
umes. As shown in Fig. 11(a), our PGMAM is su-
perior to the tree model in terms of the query time,
6.2. Performance evaluation
especially when the data volume exceeds 8 million.
6.2.1. The query performance analysis of PGMAM Moreover, when the amount of data reaches 10 mil-
To verify the query performance of PGMAM, we lion, compared with the tree model, the query time
compare the query time of PGMAM, the tree model of PGMAM is reduced by about 21%. For the tree
and MySQL. There are two types of query: the single- model, it is necessary to traverse level by level when
node query and the complex-node query. In the single- querying data, and the response time will increase with
node query case, the data service just requires obtain- the increase of the data volume. Because the meta-
A hierarchical blockchain-enabled security-threat assessment architecture for IoV 13
data of different data sources in the tree model is only The query performance of PGMAM is always superior
connected by fixed nodes, and there is no relationship to that of the tree model, especially when the number
between the metadata. The query time of PGMAM of target nodes exceeds 50.
and MySQL under different data volumes is shown in
Fig. 11(b), which demonstrates that the query time 5000
of
400 Number of Nodes
PGMAM
380
Tree Model
Fig. 12: Comparison of query time under different target nodes
ro
360
340
Query time (ms)
(a) The PGMAM and tree model be generated by simulation. In general, it takes less
time to access the cache service than to access the
4000
atomic service. To this end, the cost of accessing a
ur
PGMAM
3500 MySQL cache service is set as a random number distributed
between 1 and 10 uniformly. The cost of accessing
Jo
2500
the proportion of cache service nodes in a data service
2000 composition graph, three parameters are defined.
1500
• The proportion of cache service nodes: γ =
1000 m∗ /n∗ , where m∗ is the number of cached service
nodes in the data service composition graph, and
500
n∗ is the total number of nodes in the data ser-
0 vice composition graph. Furthermore, in order to
100 200 300 400 500 600 700 800 900 1000
Amount of data (tens of thousands)
ensure the effectiveness of simulation, the consis-
tency of γ value must be guaranteed.
(b) The PGMAM and MySQL
• The average criticality of cache node: λ′ =
Fig. 11: Comparison of complex-node query time under different λi /m∗ , where λi is the criticality of the ith cache
P
data volumes m∗
service node.
The comparison of query time under different tar- • Relative actual weight: refers to the difference
get query nodes of PGMAM, tree model and MySQL, between the actual weight and the average of ac-
as shown in Fig. 12. In particular, the number of tual weights of a Steiner tree, called RAW.
nodes on the horizontal axis represents the amount of
all metadata involved in the query. As can be seen The smaller RAW, the smaller weight of a Steiner
from the figure, the metadata query time in MySQL is tree is generated. Meanwhile, the performance of this
much longer than both PGMAM and the tree model. algorithm is better in respect of selecting the optimal
14 Yuanni Liu, et al.
of
and has better query performance than the tree model
1.1 and MySQL. Moreover, KNACO is better than an ant
ro
1 colony algorithm in selecting the optimal data service
1
6
0
composition. In the future, HBSAA will be applied in
various scenarios such as private networks and mobile
0.9
0.8
-p
crowd-aware networks to assess the risks of various
devices and ensure network security.
re
0.7
20 30 40 50 60 70 80 90 100 110 120
Acknowledgements
lP
Number of Nodes
Fig. 13: The RAW Comparison of Steiner tree when λ′ = 3 This work was supported in part by the Science and
Technology Project Program of Sichuan under Grant
na
The implication of Fig. 14 is similar to that of Fig. 2022YFG0022; in part by the Science and Technol-
13, except that the average criticality λ′ is 5. Fig. ogy Research Program of Chongqing Municipal Edu-
14 also demonstrates that the RAW of a Steiner tree cation Commission under Grant KJZD-K202000602;
ur
calculated by KNACO is always smaller than of the in part by the General Program of Natural Science
ant colony optimization. As the number of nodes in- Foundation of Chongqing under Grant cstc2020jcyj-
creases, the RAW of KNACO and ant colony opti- msxmX1021; in part by the Chongqing Natural Sci-
Jo
mization both increases or decreases within a certain ence Foundation of China under Grant cstc2020jcyj-
range, but the RAW of KNACO is always less than msxmX0343.
that of the ant colony optimization. Furthermore, by
comparing the RAW in Fig. 13 and Fig. 14, it can be
inferred that as the average criticality λ′ increases, the References
performance of KNACO becomes more prominent.
[1] J. Contreras-Castillo, S. Zeadally, J. A. Guerrero-Ibañez, In-
ternet of vehicles: architecture, protocols, and security, IEEE
1.3 internet of things Journal 5 (5) (2017) 3701–3709.
[2] S. Malik, W. Sun, Analysis and simulation of cyber attacks
1.2 against connected and autonomous vehicles, in: 2020 Inter-
Relative actual weight (RAW)
[7] K. Qu, L. Meng, Y. Yang, A dynamic replica strategy based Journal 6 (2) (2019) 3775–3784.
on markov model for hadoop distributed file system (hdfs), in: [25] T. Jiang, H. Fang, H. Wang, Blockchain-based internet of
2016 4th International Conference on Cloud Computing and vehicles: Distributed network architecture and performance
Intelligence Systems (CCIS), IEEE, 2016, pp. 337–342. analysis, IEEE Internet of Things Journal 6 (3) (2018) 4640–
[8] L. Mendiboure, M. A. Chalouf, F. Krief, Survey on 4649.
blockchain-based applications in internet of vehicles, Com- [26] J. Pan, J. Wang, A. Hester, I. Alqerm, Y. Liu, Y. Zhao,
puters & Electrical Engineering 84 (2020) 106646. Edgechain: An edge-iot framework and prototype based on
[9] K. Fan, Q. Pan, K. Zhang, Y. Bai, S. Sun, H. Li, Y. Yang, A blockchain and smart contracts, IEEE Internet of Things Jour-
secure and verifiable data sharing scheme based on blockchain nal 6 (3) (2018) 4719–4732.
in vehicular social networks, IEEE Transactions on Vehicular [27] L. Cui, Z. Chen, S. Yang, Z. Ming, Q. Li, Y. Zhou, S. Chen,
Technology 69 (6) (2020) 5826–5835. Q. Lu, A blockchain-based containerized edge computing
[10] A. Sadiq, N. Javaid, O. Samuel, A. Khalid, N. Haider, M. Im- platform for the internet of vehicles, IEEE Internet of Things
ran, Efficient data trading and storage in internet of vehicles Journal 8 (4) (2020) 2395–2408.
using consortium blockchain, in: 2020 International Wireless [28] P. K. Lahiri, D. Das, W. Mansoor, S. Banerjee, P. Chatterjee, A
Communications and Mobile Computing (IWCMC), IEEE, trustworthy blockchain based framework for impregnable iov
2020, pp. 2143–2148. in edge computing, in: 2020 IEEE 17th International Confer-
[11] X.-j. LIU, Y.-d. YIN, W. CHEN, Y.-j. XIA, J.-l. XU, L.-d. ence on Mobile Ad Hoc and Sensor Systems (MASS), IEEE,
HAN, Secure data sharing scheme in internet of vehicles based 2020, pp. 26–31.
on blockchain, Journal of ZheJiang University (Engineering [29] A. Gusenkov, N. Bukharaev, E. Birialtsev, On ontology based
Science) 55 (5) (2021) 957–965. data integration: problems and solutions, in: Journal of
[12] Y. Liu, M. Xiao, S. Chen, F. Bai, J. Pan, D. Zhang, An intel- Physics: Conference Series, IOP Publishing, 2019, p. 012059.
of
ligent edge-chain-enabled access control mechanism for iov, [30] A. K. Akanbi, M. Masinde, Semantic interoperability mid-
IEEE Internet of Things Journal 8 (15) (2021) 12231–12241. dleware architecture for heterogeneous environmental data
[13] X. Liu, X. Yu, X. Ma, H. Kuang, A method to improve the sources, in: 2018 IST-Africa Week Conference (IST-Africa),
ro
fresh data query efficiency of blockchain, in: 2020 12th Inter- IEEE, 2018, pp. 1–10.
national Conference on Measuring Technology and Mecha- [31] D. Alvarez-Coello, J. M. Gómez, Ontology-based integra-
tronics Automation (ICMTMA), IEEE, 2020, pp. 823–827. tion of vehicle-related data, in: 2021 IEEE 15th International
[14] O. V. Sawant, Combating dirty data using data virtualization,
in: 2019 IEEE 5th International Conference for Convergence
in Technology (I2CT), IEEE, 2019, pp. 1–5.
[15] X. Luo, X. Gao, Z. Tan, J. Liu, X. Yang, G. Chen, D2-tree: A
-p Conference on Semantic Computing (ICSC), IEEE, 2021, pp.
437–442.
[32] F. Ekaputra, M. Sabou, E. Serral Asensio, E. Kiesling, S. Biffl,
Ontology-based data integration in multi-disciplinary engi-
re
distributed double-layer namespace tree partition scheme for neering environments: A review, Open Journal of Information
metadata management in large-scale storage systems, in: 2018 Systems 4 (1) (2017) 1–26.
IEEE 38th International Conference on Distributed Comput- [33] E. Martínez, D. M. Toma, S. Jirka, J. Del Río, Middleware for
lP
ing Systems (ICDCS), IEEE, 2018, pp. 110–119. plug and play integration of heterogeneous sensor resources
[16] M. Y. Jung, J. W. Jang, Data management and searching sys- into the sensor web, Sensors 17 (12) (2017) 2923.
tem and method to provide increased security for iot platform, [34] A. Bogdanov, A. Degtyarev, N. Shchegoleva, V. Khvatov,
in: 2017 International conference on information and com- V. Korkhov, Evolving principles of big data virtualization, in:
na
munication technology convergence (ICTC), IEEE, 2017, pp. International Conference on Computational Science and Its
873–878. Applications, Springer, 2020, pp. 67–81.
[17] T. G. Sarath, Centralized server based atm security system [35] A. Aleyasen, M. A. Soliman, L. Antova, F. M. Waas,
with statistical vulnerability prediction capability, in: 2017 M. Winslett, High-throughput adaptive data virtualization via
ur
IEEE International Conference on Consumer Electronics-Asia context-aware query routing, in: 2018 IEEE International
(ICCE-Asia), IEEE, 2017, pp. 61–66. Conference on Big Data (Big Data), IEEE, 2018, pp. 1709–
[18] M. ur Rahman, V. Deep, S. Multhalli, Centralized vulnerabil- 1718.
Jo
ity database for organization specific automated vulnerabili- [36] T. A. Manoj Muniswamaiah, C. Tappert, Data virtualization
ties discovery and supervision, in: 2016 International Confer- for decision making in big data, International Journal of Soft-
ence on Research Advances in Integrated Navigation Systems ware Engineering & Applications 10 (5) (2019) 45–53.
(RAINS), IEEE, 2016, pp. 1–5. [37] M. Gottlieb, M. Shraideh, I. Fuhrmann, M. Böhm, H. Krcmar,
[19] D. Zhang, Y. Liu, L. Dai, A. K. Bashir, A. Nallanathan, Critical success factors for data virtualization: A literature re-
B. Shim, Performance analysis of fd-noma-based decentral- view, The ISC International Journal of Information Security
ized v2x systems, IEEE Transactions on Communications 11 (3) (2019) 131–137.
67 (7) (2019) 5024–5036. [38] Z. Zi-ye, L. Yu-long, H. Bei, Multi-source data integration
[20] M. Walkowski, M. Krakowiak, J. Oko, S. Sujecki, Dis- method based on data virtualization technology, Computer
tributed analysis tool for vulnerability prioritization in cor- and Modernization (11) (2019) 18–22.
porate networks, in: 2020 International Conference on Soft- [39] Y. Hua, X. Liu, Semantic-aware metadata organization for
ware, Telecommunications and Computer Networks (Soft- exact-matching queries, in: Searchable Storage in Cloud
COM), IEEE, 2020, pp. 1–6. Computing, Springer, 2019, pp. 67–97.
[21] R. Kothari, B. Jakheliya, V. Sawant, Implementation of a dis- [40] J. Fan, J. Yan, Y. Ma, L. Wang, Big data integration in remote
tributed p2p storage network, in: 2020 IEEE International sensing across a distributed metadata-based spatial infrastruc-
Conference for Innovation in Technology (INOCON), IEEE, ture, Remote Sensing 10 (1) (2017) 7.
2020, pp. 1–7. [41] M. Khani Dehnoi, S. Araban, Automatic qos-aware web ser-
[22] J. Wu, M. Dong, K. Ota, J. Li, Z. Guan, Big data analysis- vices composition based on set-cover problem, International
based secure cluster management for optimized control plane Journal of Nonlinear Analysis and Applications 12 (1) (2021)
in software-defined networks, IEEE Transactions on Network 87–109.
and Service Management 15 (1) (2018) 27–38. [42] Y. Li, J. Hu, Z. Wu, C. Liu, F. Peng, Y. Zhang, Research on
[23] Z. Ma, J. Zhang, Y. Guo, Y. Liu, X. Liu, W. He, An ef- qos service composition based on coevolutionary genetic al-
ficient decentralized key management mechanism for vanet gorithm, Soft Computing 22 (23) (2018) 7865–7874.
with blockchain, IEEE Transactions on Vehicular Technology [43] H. Elmaghraoui, L. Benhlima, D. Chiadmi, Dynamic web ser-
69 (6) (2020) 5836–5849. vice composition using and/or directed graph, in: 2017 3rd
[24] Y. Yao, X. Chang, J. Mišić, V. B. Mišić, L. Li, Bla: International Conference of Cloud Computing Technologies
Blockchain-assisted lightweight anonymous authentication and Applications (CloudTech), IEEE, 2017, pp. 1–8.
for distributed vehicular fog services, IEEE Internet of Things [44] C. Wang, X. Zhang, D. Chu, Research on service composi-
16 Yuanni Liu, et al.
of
mation and Telecommunication Systems (CITS), IEEE, 2020,
pp. 1–4.
[50] M. Castro, B. Liskov, et al., Practical byzantine fault toler-
ro
ance, in: Proceedings of the third symposium on Operating
systems design and implementation, USENIX Association,
1999, pp. 173–186.
[51] S. Kato, Y. Inagaki, M. Aoyama, A structural analysis method
of oss development community evolution based on a semantic
graph model, in: 2018 IEEE 42nd Annual Computer Software
and Applications Conference (COMPSAC), IEEE, 2018, pp.
-p
re
292–297.
[52] E. Neshati, A. A. P. Kazem, Qos-based cloud manufacturing
service composition using ant colony optimization algorithm,
lP
☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
f
r oo
-p
re
lP
na
ur
Jo