Identifying Topology of Low Voltage Distribution PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO.

5, SEPTEMBER 2018 5113

Identifying Topology of Low Voltage Distribution


Networks Based on Smart Meter Data
Satya Jayadev Pappu, Student Member, IEEE, Nirav Bhatt, Member, IEEE,
Ramkrishna Pasumarthy, Member, IEEE, and Aravind Rajeswaran

Abstract—In a power distribution network, the network The topology of an LV distribution network gives the con-
topology information is essential for an efficient operation. This nectivity among its numerous assets such as feeders, distribu-
network connectivity information is often not available at the tion transformers, distributors and consumers. The information
low voltage (LV) level due to uninformed changes that happen
from time to time. In this paper, we propose a novel data- of the underlying network topology is useful for efficient inte-
driven approach to identify the underlying network topology gration of renewable energy sources and efficient management
for LV distribution networks including the load phase connec- of outages in distribution networks [4], [5]. Further, for a
tivity from time series of energy measurements. The proposed reliable state estimation in a distribution network, accurate
method involves the application of principal component anal- information of the network topology is essential [6].
ysis and its graph-theoretic interpretation to infer the steady
state network topology from smart meter energy measurements. Some of the LV consumers operate on single-phase and they
The method is demonstrated through simulation on randomly draw power from one of the three phases of a distribution
generated networks and also on IEEE recognized Roy Billinton transformer. The phase connectivity of those consumers also
distribution test system. forms a part of the network topology information. This infor-
Index Terms—Phase identification, distribution network topol- mation is important in maintaining load and voltage balances
ogy, smart meters, principal component analysis, graph theory. in the three phases of the distribution transformers and the
distribution feeders. Unbalanced loads on transformers and
feeders lead to higher copper losses and voltage drop, and
I. I NTRODUCTION
consequently affect the life of the assets [7].
HE COMPLEXITY of power distribution networks is
T increasing day by day with advancements in technol-
ogy and addition of new and sophisticated components to
The network topology information might not be accurately
available at all time because of changes that take place due
to network reconfiguration, repairs, maintenance and load bal-
the power grid. The concentration of power systems con- ancing [7], [8]. Moreover, the consumers might have a facility
trol and monitoring has traditionally been at the generation, to switch between phases when a phase trips, and, thus chang-
transmission and high voltage distribution levels. A need for ing the topology. Often, the network operators are not aware
advancement in control at the Low Voltage (LV) distribution of such changes in the topology [9].
level arose with the advent of active distribution networks A number of attempts were made to solve the problems
with intermittent Distributed Energy Resources (DER) such of phase identification, and topology identification. Smart grid
as solar, wind energy etc., and plug-in devices such as electric technologies have further intensified the search for new meth-
vehicles. Research is actively pursued in the areas of con- ods of inferring connectivity. The methods for topology iden-
trol and automation of distribution networks, and in many tification can be classified into two categories: (i) Hardware
cases, it is assumed that network topology information is based methods, and (ii) software based methods.
available [1]–[3]. Hardware based methods include microprocessor based
phase identification system and signal injection device
Manuscript received September 1, 2016; revised December 22, 2016;
accepted February 13, 2017. Date of publication March 9, 2017; designed for phase measurement [10], [11]. However, the addi-
date of current version August 21, 2018. The work of S. J. Pappu tional hardware and staff required for these devices to work,
was supported by the Data Science Initiative of IIT Madras under makes these options costly.
Grant CSE1415831RFTPBRAV. The work of N. Bhatt was supported
by the Department of Science and Technology, India, through INSPIRE Software based methods have become popular with the
Faculty Fellowship Award IFA12-ENG-34. Paper no. TSG-01191-2016. advent of Advanced Metering Infrastructure (AMI) such as
(Corresponding author: Satya Jayadev Pappu.) smart meters and Phasor Measurement Units (PMU). These
S. J. Pappu and R. Pasumarthy are with the Department of Electrical
Engineering, Indian Institute of Technology Madras, Chennai 600036, India devices are installed at important nodal points and they gener-
(e-mail: ee15d202@smail.iitm.ac.in; ramkrishna@ee.iitm.ac.in). ate large amount of data at regular time intervals which can be
N. Bhatt is with the Department of Chemical Engineering, Indian collected and analysed at centralised data centres. In the liter-
Institute of Technology Madras, Chennai 600036, India (e-mail:
niravbhatt@iitm.ac.in). ature, researchers have proposed methods for analysing these
A. Rajeswaran is with the Department of CSE, University of Washington, data for topology identification.
Seattle, WA 98195 USA (e-mail: aravraj@cs.washington.edu). There were some software based methods which were
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. presented prior to the development of AMI. One is a search
Digital Object Identifier 10.1109/TSG.2017.2680542 algorithm to determine phase information using power flow
1949-3053 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
5114 IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO. 5, SEPTEMBER 2018

TABLE I
measurements and load data [12]. Its drawback is that N OTATIONS
it ignores noise and uncertainty in data. Another method
proposed real–time monitoring of changes in the underlying
network topology based on the status of circuit breakers [13].
The latest methods include optimization based approach
to infer phase connectivity and also network topology from
time series of power measurements [14], [15]. The authors
proposed Mixed Integer Programming (MIP) based solution
which is computationally intensive to solve without relaxing
the constraints. In [16], a technique to identify the phases
based on cross-correlation method using the time series of
voltage measurements, is presented. Reference [17] proposed
a linear regression based algorithm for phase identification
which considers the correlation between consumer voltage and
substation voltage. It requires the Geographical Information
System (GIS) model which may not always be available. In
another method for phase identification, data obtained from
micro synchrophasor measurement units (μPMU) is anal-
ysed [18]. In [6] and [19], methods were proposed to infer
topology from time series of measurements from PMUs.
In [20], an algorithm to identify the network topology using
voltage correlation analysis is presented. In [21], a hypoth-
esis testing based technique to the topology identification is
proposed using the signals generated by the PLC network laid
in a smart grid. paper, we will use the method of model identification using
In this work, we propose a novel method for identifying the PCA from measurements and, we will revisit it next.
underlying steady state network topology from smart meter 1) Model Identification Using PCA: Let zm ( j) be defined
energy measurements. Our approach integrates PCA and its as a sample of n variables measured at the jth time instance,
graph-theoretic interpretation (as shown in [22]) to identify as follows:
the network topology from time series of energy measure-  T
1 ( j), z2 ( j), . . . , zn ( j)
zm ( j) = zm m m
ments, based on the principle of energy conservation. To factor (1)
in the high penetration of DERs, we show how our formu- Generally, the measured values are corrupted by random noise
lation takes care of the presence of DERs behind energy leading to an error in the samples. The vector of measured
meters. We also consider the presence of technical losses, ran- variables can be written as:
dom errors in measurements and errors due to imperfect time
synchronization of smart meters. zm ( j) = zt ( j) + e( j) (2)
As a preliminary, in our prior work [23], we introduced
where zt ( j) is the vector of true values of the variables at
this approach for phase identification. In this paper, we eluci-
the jth time instance and e( j) is the vector of errors due to
date our method in detail and extend it to solve the problem
noise. It is assumed that the error is normally independent
of network topology identification. We present the algorithms
and identically distributed (i.i.d.) as follows:
explicitly and corroborate them through simulations on ran-  
domly generated networks and also on Roy Billinton test e( j) ∼ N 0, σe2 I (3)
system. A simple example is also presented for better under-
standing. We also perform time complexity analysis of the where σe2 is an error variance, N indicates the Gaussian dis-
algorithm. tribution, and I is the (n × n) identity matrix. The n variables
The rest of the paper is organised as follows. Section II are linearly related by the following model:
revisits some necessary preliminaries. The mathematical for-
C zt ( j) = 0 (4)
mulation of the problem is given in Section III. The proposed
solution and algorithms are presented in Section IV. Finally, where C is a (p × n)-dimensional constraint matrix, with p
the simulation results with time complexity analysis and being the number of linear relationships. From measurement
conclusions are provided in Sections V and VI, respectively. vectors available at N time instants, an (n × N)-dimensional Z
matrix can be constructed by stacking zm ( j), j = 1, 2, . . . , N
vectors. Eq. (4) indicates that the noise-free data lies in an
II. P RELIMINARIES
(n − p)-dimensional subspace orthogonal to the p-dimensional
A. Principal Component Analysis (PCA) subspace spanned by the rows of the C. The objective of model
PCA is one of the widely used tools of multivariate data identification using PCA is to estimate the (n−p)-dimensional
analysis with many applications. PCA has been applied to true data subspace and the p-dimensional constrained sub-
identify a model in the presence of noise [24], [25]. In this space, given the data matrix Z.
PAPPU et al.: IDENTIFYING TOPOLOGY OF LV DISTRIBUTION NETWORKS BASED ON SMART METER DATA 5115

In PCA, these subspaces are obtained from the eigenvectors


of the covariance matrix Sz = ZZT . The subspaces are iden-
tified such that the sum of the squared difference between
the measured values and denoised estimates of the values
of the variables is minimised [25]. The eigenvectors of the
covariance matrix can be determined using Singular Value
Decomposition (SVD) of Z as follows:
SVD(Z) = U1 S1 VT1 + U2 S2 VT2 (5)
where U1 is the set of orthonormal eigenvectors corresponding
to the (n − p) largest eigenvalues of Sz while U2 is the orthog- Fig. 1. A Forest with three tree components.
onal eigenvectors corresponding to the smallest p eigenvalues
of Sz . S1 and S2 are diagonal matrices with the singular values
of Z. It has been shown that SR (UT2 ) ∼ SR (C), where SR (.) where E is the error matrix, and Zt is the data matrix hav-
indicates the subspace spanned by the rows of (.) matrix [26]. ing the true values. The covariance matrix of the transformed
Then, UT2 satisfies the following relationship: matrix is:
UT2 z = 0 (6)
Szs = Zs ZTs (12)
Hence, UT2 gives the constraint matrix and it is to be observed
that the constraint matrix suffers from the rotational ambiguity:
It is shown that by applying PCA on Zs , we get an estimate of
UT2 z = Ĉz = QĈz = 0 (7) the constraint matrix C pertaining to the transformed data, on
which inverse transformation is applied to get constraint matrix
where Q is a non-singular matrix. Hence, the estimated con- corresponding to original data [26]. We apply PCA to Zs and
straint matrix Ĉ is not unique and may not be the one which get an estimate of the constraint matrix and the regression
has direct physical interpretation. matrix corresponding to the original data as follows:
A regression model can also be obtained by partitioning the
variables into a set of dependent variables zd having dimen-
sion (nd = p) and independent variables zi (ni = n − p). The SVD(Zs ) = U1s S1s VT1s + U2s S2s VT2s (13)
columns of Ĉ corresponding to the zd and zi can also be par- Ĉ = UT2s L−1 (14)
titioned as follows: Ĉ = [Ĉd , Ĉi ], where Ĉd and Ĉi are the  −1
(nd × nd )− and (nd × ni )−dimensional matrices, respectively. R̂ = − Ĉd Ĉi . (15)
Then, from Eq. (7) we obtain
Ĉd zd + Ĉi zi = 0. (8) B. Graph Theory Overview
Since U2d is of full rank, we can express Eq. (8) in terms In this section, certain concepts of algebraic graph theory
of the estimated regression matrix relating the dependent and pertaining to this work are revised.
independent variables as follows: 1) Graph and Sub-Graph: A graph G = (NG , NE ) contains
 −1 a set of nodes (NG ) and edges (NE ), whose connectivity rep-
zd = − Ĉd Ĉi zi = R̂zi , (9) resents a network of physical or abstract elements. A graph
is said to be directed if its edges are directed from one node
where R̂ is the (nd × ni )–dimensional regression matrix. The to another. S is a sub-graph of G with set of nodes NS ⊂ NG
regression matrix R̂ is proven to be unique [26]. and set of edges ES ⊂ EG such that ES contains all edges with
The estimate of subspace of C is not optimal in maxi- both end points in NS .
mum likelihood sense when the assumption of i.i.d. error in A graph is said to be connected if there exists a path between
Eq. (3) does not hold, e( j) ∼ N (0,  e ). In such cases, two every pair of its nodes, otherwise the graph is disconnected.
approaches proposed in [25] can be used to estimate C. Next, The connected sub-graphs of a disconnected graph are referred
we will describe briefly one of the approaches when the error to as its components.
covariance matrix  e is known. For details of both approaches, 2) Tree: A tree is a graph with no circuits, and any two
refer [25]. The approach transforms the data matrix by scal- nodes in a tree are connected by a unique path. A directed
ing it with Cholesky factor of the error covariance matrix. tree is a directed graph with no circuits. A disconnected graph
Cholesky decomposition of  e is given by: with its components as trees is called a forest. Fig. 1 shows a
directed forest with three tree components. In a directed graph,
 e = LLT (10) a parent node is a node which has an edge directed to another
node(s) called the child node(s). In Fig. 1, the nodes 4, 5 and 6
where L is the (n × n)–dimensional lower triangular matrix.
are child nodes to the parent node 1.
The noisy data matrix is transformed into Zs as follows:
3) Incidence Matrix: The incidence matrix (A) of a graph,
Zs = L−1 Z = L−1 Zt + L−1 E (11) G = (NG , NE ), describes the incidence of edges on nodes and
5116 IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO. 5, SEPTEMBER 2018

an element in A is defined as follows for a directed graph:




⎨+1, if edge j enters node i
Aij = −1, if edge j leaves node i (16)


0, if edge j is not incident on i
It is of dimension n × e where n = |NG | and e = |NE |.
Proposition 1: A directed graph (or a directed forest) can
be uniquely constructed from an incidence matrix, provided
there are no self loops.
Proof: The proof of the Proposition is similar to the proof
of [27, Th. 8, Ch. 3]. By definition of incidence matrix A, the
Fig. 2. Tree representation of network topology.
number of nodes and edges of the graph are known through
the number of rows and columns of A.
We now have to show that the edge connectivity between
of such an assignment is that it also allows us to determine
the nodes can be uniquely determined from the elements of A.
the phase identity of single-phase consumers.1
Each column of A corresponding to an edge, has only one +1
element and only one −1 element which uniquely point to the
nodes where that particular edge begins and ends. Thus, this B. Energy Measurements
information gives the connectivity of all the edges without Since we assume that smart meters are installed at all
any ambiguity. Since there are no self loops which cannot the nodal points in the network, energy measurements in
be represented in the incidence matrix of a directed graph, watt-hour (Wh) are obtained from the meters over for reg-
there is no loss of information making the graph reconstruction ular time intervals, generally, fifteen or thirty minutes. These
complete and unique. measurements are collected at a centralised location. The mea-
surements are stacked together to form a data matrix, Z, as
C. Types of Distribution Networks follows:
Based on the topology of the networks, the distribution  
Z = zij (n×N) (17)
networks are classified as:
i Radial distribution networks: In this configuration, each
i ( j), is the energy measure-
where zij , henceforth denoted as zm
of the feeders and distributors is fed by a single source. ment corresponding to the ith node in the jth time interval, n
Hence, this type of network is characterized by existence is the number of nodes in the network and N is the number
of a unique path from the source (substation) to each of of measurements captured per node.
the consumers.
ii Ring main distribution networks: In this configuration,
the feeders and distributors may be connected to multiple C. Energy Conservation
sources for higher reliability of power supply. Hence, In this section, the concept of energy conservation will
multiple sources are available for feeding a load, and be illustrated using an example. Consider a graph of a
there may be multiple paths between such sources and power network having eight energy meters (denoted as nodes
loads. However, during steady network operation, the 1, 2, . . . , 8), connected through seven power lines (denoted as
tie-switches and circuit breakers are configured such that edges a, b, . . . , g) as shown in Fig. 3. The incoming energy
only one source feeds a load, and an electrically active is captured by an energy meter at each of the nodes.
path between them is unique. Hence, the active steady The principle of conservation of energy implies that the
state network can still be considered to be radial. sum of energies of incoming lines is equal to sum of energies
of outgoing lines at any node. Assuming noise-free readings,
III. P ROBLEM F ORMULATION the meter readings at the nodes 1, 2, . . . , 8 in Fig. 3 can be
related via the following equations, by applying the principle
A. Distribution Network As a Tree of conservation of energy, for all times j = 1, . . . , N:
The topology of a distribution network can be considered to
be the connectivity between the meters installed at the substa- zt1 ( j) = zt2 ( j) + zt3 ( j) (18)
tion, feeders, transformers, and consumer mains. A graph can zt2 ( j) = zt4 ( j) + zt5 ( j) (19)
be constructed by assigning nodes to each of the meters and the
zt3 ( j) = zt6 ( j) + zt7 ( j) + zt8 ( j). (20)
connections between them can be represented as edges. Since
the paths from the substation to each of the consumers are Note that the parent node (meter) reading is equal to the sum
unique as described in Section II-C, the graph of a distribution of its child nodes (meters) readings in the graph of a dis-
network is a tree as shown in Fig. 2. For the meters of 3-phase tributed network due to the energy conservation. This principle
loads and 3-phase transformers, we assign three separate nodes
corresponding to each of the three phases, in order to maintain 1 In this paper, the words ‘meters’, ‘nodes’ and ‘variables’ are interchange-
consistency with the single-phase loads. The main advantage able. Also the words ‘samples’ and ‘readings’ are interchangeable.
PAPPU et al.: IDENTIFYING TOPOLOGY OF LV DISTRIBUTION NETWORKS BASED ON SMART METER DATA 5117

Fig. 3. A distribution network with eight energy meters connected through


seven power lines.
Fig. 4. A network with DERs.

leads a set of linear equations between the nodal readings


described by: 1) Technical Losses: The technical losses include the con-
stant losses such as iron losses, dielectric losses, and the

zk ( j) = zi ( j), ∀k ∈ K, i ∈ Ik (21) variable losses such as copper losses. The latter vary with
i
the load in the network and also depend on the length of the
lines. These losses introduce an offset in the energy conser-
where K is the set of all parent nodes in the graph and Ik is vation equations written on the nodal readings. Let λ( j) be
the set of child nodes to the parent node k. a vector of technical losses reflected in the dependent nodes’
measurements, in the jth time interval. They can be modelled as
D. Network With Distributed Energy Resources (DERs) Gaussian with a non-zero mean, and heteroscedastic variance
as follows:
The penetration of DERs in distribution networks is increas-
ing gradually, and it is imperative to consider their presence λ( j) ∼ N (μλ ,  λ ) (22)
in the network. The DERs are generally installed behind con- where μλ is the vector of means and  λ is the covariance
sumer meters, and they supply power to the consumers or matrix. The mean captures the sum of the constant losses and
feed-in power to the grid when supply exceeds demand locally. the average of the variable losses while the variance captures
This feed-in of excess supply is compensated by the utilities the change in the variable losses. Since there are no correla-
to the consumers through the concept of net-metering [28]. tions between losses in the different lines,  λ is a diagonal
With the net metering, the meter outputs is the net imbalance matrix with no covariance elements.
of local demand and supply. 2) Random Errors in Meter Readings: The latest ANSI
The advantage of dealing with energy measurements and standard for electricity meters stipulates that electricity meters
energy conservation is that they are unaffected by the net must be of 0.2 or 0.5 accuracy class [29]. This indicates that
metering. For example, consider a transformer phase, say the meter reading can be in the range of ± 0.2% and ± 0.5% of
Phase P, feeding three consumers, say a, b, and c, each true values for 0.2 and 0.5 accuracy class meters, respectively.
having a DER connected, as shown in Fig. 4. The energy This error can also be modelled to be Gaussian with each vari-
meters of each of the consumers measure the net energy con- able having a different error variance. The distribution of the
sumed. Suppose in jth time interval, the net consumption of error vector due to random errors ( j) in the readings during
Consumer ‘a’ is negative while that of Consumers ‘b’ and the jth time interval is given by:
‘c’ is positive. For example, zta ( j) = −20 units, ztb ( j) = 50
units and ztc ( j) = 100 units. Energy fed in through meter ‘a’ ( j) ∼ N (0,   ) (23)
is consumed by Consumers ‘b’ and ‘c’, while the rest of the where   is a diagonal error–covariance matrix due to uncor-
demand is satisfied by Phase ‘P’. Thus, in an ideal scenario, related errors.
ztP ( j) = 130 units. It can be observed that even in such a case 3) Clock Synchronization Errors (CSE): Though the clocks
energy conservation holds, i.e., ztP ( j) = zta ( j) + ztb ( j) + ztc ( j). of all the meters are assumed to be synchronized, the synchro-
Thus, the formulation inherently takes care of DERs in the nism may not be perfect leading to time intervals of energy
network. Hence, the methods proposed in the next sections can measurements to be varying. The variation is generally in the
also be applied to the network with DERs, which becomes evi- order of milliseconds to few seconds. The error introduced
dent in Section IV. However, inferring the presence of DERs by this variation is modelled to be a zero-mean Gaussian
in the underlying network is not within the scope of this work. distribution as follows:
δ( j) ∼ N (0,  δ ) (24)
E. Losses and Errors
where  δ is an n-dimensional diagonal matrix.
In practice, we have to account for technical losses and
With random errors and clock synchronization errors, the
sources of noise in the measurements due to random errors
measurements at the jth interval can be written as:
and clock synchronization errors in the smart meter readings.
These losses and errors are modelled next. zm ( j) = zt ( j) + ( j) + δ( j). (25)
5118 IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO. 5, SEPTEMBER 2018

It is assumed that the cross-correlation between the different with principle of energy conservation. It can be verified that
components of error is negligible and we get, the regression matrix R which regresses the dependent vari-
ables on the independent variables is in fact the matrix Ad
( j) + δ( j) ∼ N (0,  e ),  e =   +  δ . (26)
with a negative sign. The uniqueness of R makes it compara-
ble element-wise to −Ad and hence, the connectivity of the
IV. P HASE AND T OPOLOGY I DENTIFICATION underlying graph can be inferred from R.
Throughout this section, the following assumptions are In the presence of technical losses, Eq. (7) transforms into
made: C zt = μλ . Also, PCA assumes the errors in measurements due
1) There is no theft of electricity and there are no un- to noise, to be i.i.d. Hence, to estimate C through PCA, μλ ,
metered loads in the network.  λ ,   and  δ are to be estimated from data and Z needs to
2) The topology of the underlying network remains unal- be pre-processed. The elements of μλ are estimated from the
tered while the N measurements are captured. mean of the total technical loss over all samples. Let P and C
be the sets of rows of Z corresponding to phase nodes and
A. Phase Identification consumer nodes, respectively. The mean is calculated from
In the phase identification problem, we can distinguish two the difference in the summation of phase readings and the
kinds of nodes: (i) three nodes corresponding to the three summation of consumer readings, as follows:
phases of a transformer, (ii) consumer nodes. Since each con-  
N 
  m
sumer node is connected to only one of the phases, the graph zm
k ( j) − z i ( j)
j=1 k i
turns out to be a forest with three trees. Each tree has a parent μ̂t = , ∀ k ∈ P, i ∈ C. (28)
node representing a phase and a number of child nodes repre- N
senting consumers. Then, the problem of phase identification The elements of μλ , corresponding to each phase, are esti-
is to determine which child nodes are descendants of which mated as fractions of μ̂t proportional to the mean of respective
parent node. phase readings. The element of μλ corresponding to the kth
Let us consider the forest shown in Fig. 1. In this forest, the phase is estimated as:
phase meters are parent nodes, and the consumer meters are
child nodes. Then, the incidence matrix (A) for this network 
N
k ( j)
zm
is given by: j=1
μ̂λ (k) = μ̂t , ∀ k ∈ P. (29)
a b c d e f g h i 
N
⎛ ⎞ k ( j)
zm
P1 −1 −1 −1 0 0 0 0 0 0 k j=1
P2⎜
⎜ 0 0 0 −1 −1 −1 0 0 0 ⎟

P3⎜ 0 0 0 0 0 0 −1 −1 −1 ⎟ The pre-processing step includes separation of μλ from each
⎜ ⎟
C1⎜ 0 ⎟
1 0 0 0 0 0 0 0 of the samples to ensure zero offset in the model, as follows:
⎜ ⎟
C2⎜
⎜ 0 1 0 0 0 0 0 0 0 ⎟
⎟ z̃( j) = zm ( j) − μλ , ∀ j = 1, . . . , N. (30)
C3⎜
⎜ 0 0 1 0 0 0 0 0 0 ⎟

C4⎜
⎜ 0 0 0 1 0 0 0 0 0 ⎟
⎟  λ has only three variance elements corresponding to the three
C5⎜
⎜ 0 0 0 0 1 0 0 0 0 ⎟
⎟ phases. They are estimated from the variance in the total tech-
C6⎜
⎜ 0 0 0 0 0 1 0 0 0 ⎟
⎟ nical loss in fractions of variances of respective phase readings.
C7⎜
⎜ 0 0 0 0 0 0 1 0 0 ⎟
⎟ The variance in total technical loss (denoted by lt ), is estimated
C8⎝ 0 0 0 0 0 0 0 1 0 ⎠ as:
C9 0 0 0 0 0 0 0 0 1  2
N   m
The sub-matrix corresponding to the first three rows (related zm
k ( j) − z i ( j) − μ̂t
j=1 k i
to the parent nodes) of the incidence matrix A provides the Var[lt ] = , ∀ k ∈ P, i ∈ C.
edge connectivity of the given network. Indeed, inferring this N
(31)
sub-matrix of A from measurements is sufficient to obtain the
edge connectivity of the graph. This connectivity is unique The diagonal element of  λ corresponding to the kth phase is
according to Proposition 1. In general, the incidence matrix estimated as:
can be split as:
  Var[zk ]
ˆ λ (k) = Var[lt ] 
 , ∀ k ∈ P, (32)
Ad Var[zk ]
A= (27)
Ai k

where Ad are the rows corresponding to parent nodes and Ai where Var[zk ] is the variance in N readings corresponding to
are the rows corresponding to child nodes. kth phase. The diagonal elements of   are estimated based on
Due to the nature of phase and consumer measurements, the the accuracy class of the meters. Let α be the accuracy class
parent node variables can be taken as the dependent variables of a meter and as a result, the random errors in all the meter
and the child node variables as the independent variables. With readings lie within α percentage of the reading. As nearly
this notation, the regression matrix, R, given by PCA on mea- all values of a Gaussian distribution lie within three times
surements, relates the parent and child nodes in accordance its standard deviation, we estimate α percentage of the mean
PAPPU et al.: IDENTIFYING TOPOLOGY OF LV DISTRIBUTION NETWORKS BASED ON SMART METER DATA 5119

Algorithm 1 Phase Identification


1: Start with N samples of measurements stacked in Z
2: Estimate μλ as per Eq. (29)
3: Subtract μλ from all columns of Z as per Eq. (30) to get

4: Estimate  λ ,   and  δ as per Eqs. (31) to (34)
5: Calculate  e using Eq. (26) and add variance elements of
 λ to corresponding elements of  e
6: Compute Ĉ by applying PCA on Z̃ following Eqs. (10)
to (15)
7: Calculate R̂ as per Eq. (9)
Fig. 5. Layer-Wise Tree representation of Network Topology.
8: Round off R̂ to truncate deviations due to noise and
numerical residues. In each column, the element closest
to 1 is rounded to 1 and rest 0. Algorithm 2 Topology Identification
9: Infer phase connectivity from R̂ 1: Start with N samples of measurements stacked in Z and
10: End let l = 1.
2: while l ≤ nl do
T
3: Let Z∗ = [zTi ] ∀i ∈ Nl+1 ∪ Nl
of the readings as three times the standard deviation of ran- 4: Estimate μλ from Z∗ as per Eq. (29)
dom errors. The diagonal element of   corresponding to ith 5: Subtract μλ from all columns of Z∗ as per Eq. (30) to
variable is estimated as: get Z̃∗
 2 6: Estimate  λ ,   and  δ from Z∗ as per Eqs. (31) to
α z¯i (34)
ˆ
 (i) = ∀i ∈ P ∪ C, (33)
3 × 100 7: Calculate  e using Eq. (26) and add variance elements
of  λ to corresponding elements of  e
where z¯i is the mean of the ith variable. The standard deviation 8: Compute Ĉ by applying PCA on Z̃∗ following
in the error due to imperfect time synchronization is taken as Eqs. (10) to (15)
the deviation in the reading caused by one second change in 9: Calculate R̂ as per Eq. (9)
the time interval. For each variable, this deviation is estimated 10: Round off R̂ to truncate deviations due to noise and
from the mean of its readings. Hence, the diagonal element of numerical residues.
 δ corresponding to the ith variable is estimated as: 11: Let R̂l = R̂ and l = l + 1
 
z¯i 2 12: Infer the topology from R̂1 , ..., R̂L−1 .
ˆ
δ (i) = ∀i ∈ P ∪ C, (34)
60T 13: End

where T is the time interval of a reading in minutes.


Now,  e is calculated following Eq. (26). Let Z̃ be the data
V. S IMULATION R ESULTS
matrix after the pre-processing step of offset separation. PCA
is applied on Z̃ as described in Section II-A1 to estimate R The proposed algorithms are demonstrated through simu-
and the phase connectivity is inferred. Algorithm 1 describes lations on noisy data sets. The simulations are conducted on
the method for phase identification. MATLAB R 2014a.

B. Topology Identification A. Phase Identification


The solution to the phase identification problem can be First, we illustrate the proposed algorithm on a simple exam-
extended to the topology identification problem by visualiz- ple. Consider a phase connectivity network with twelve nodes
ing the tree structures in a layered manner. The nodes of the as shown in Fig. 6. The network has nine consumers connected
tree can be separated into layers with each layer having meters to different phases of a transformer. We generated nine sample
(nodes) operating at known voltage level, as indicated in Fig. 5. readings including losses and errors for all the meters in this
Any set of two successive layers, when visualised separately, network. The measurement matrix Z is given in Table II, with
appears as a forest of directed trees. each column being a sample vector. Steps 2 to 5 are applied
Now, the problem is reduced to finding connectivity of a to Z to compute the constraint matrix A.2
forest of directed trees, which is similar to the phase iden- The corresponding regression matrix R is estimated to be:
tification problem. By inferring the connectivity between all
⎛ ⎞
possible successive layers, the complete network topology can 0.97 1.16 1.21 0.13 −0.04 0.04 −0.13 −0.29 −0.05
be identified. ⎝ −0.1 0.07 0.11 1.12 0.83 1.08 −0.13 −0.25 −0.05⎠
Let the layers be numbered from bottom to top as shown −0.11 0.07 0.03 0.13 −0.46 0.05 0.96 0.82 1.01
in Fig. 5. Let nl be the number of layers and Nl be the set of
nodes present in layer l. Let zTi be the ith row of data matrix 2 Note that A can be computed with the rotational ambiguity as mentioned
Z. The following is the algorithm to topology identification. in Section II-A1. Hence, the numerical values have not been given here.
5120 IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO. 5, SEPTEMBER 2018

TABLE II
M EASUREMENTS FOR THE N ETWORK IN F IG . 6

Fig. 7. No. of nodes Vs Simulation time.

TABLE III
R ESULTS OF C OMPARATIVE S IMULATIONS

Fig. 6. An example graph for phase identification.


6) To account for synchronization errors, 15 minute time
interval is assumed and Gaussian error is added, with
It can be observed that the network connectivity can be standard deviation equal to the deviation in reading
inferred from rounded R̂ matrix. R̂ is rounded off to get: caused by one second change in the interval.
The algorithm is then applied to hundred data sets, with
2 3 4 6 7 8 10 11 12
⎛ ⎞ different values of N as multiples of c, where c is the number
1 1 1 1 0 0 0 0 0 0 of consumer nodes (Windows 10, Intel i5-4200U 1.64 Ghz
5⎝ 0 0 0 1 1 1 0 0 0⎠ processor, 6 GB RAM). The time taken to arrive at the solution
9 0 0 0 0 0 0 1 1 1 against the number of nodes for different number of readings
Next, we demonstrate the algorithm on the larger networks. is plotted as shown in Fig. 7.
The network is built using random number generators in It can be observed from the Fig. 7 that the time taken
MATLAB, R as follows:
for phase identification is in the order of milli-seconds while
1) The number of consumers connected per phase are an alternate method in [14], which uses same type of data,
chosen randomly (uniformly) between 75 and 100. presents time taken to be in the order tens of seconds.
2) The N readings for each of the consumer meters Assuming that the computational power used in both the cases
are sampled from one of the three uniform distribu- to be of same order, our method clearly outperforms the
tions, with ranges (0 − 100), (0 − 300) and (0 − 500), method proposed in [14], in terms of time.
to account for consumers with different ranges of To compare the identification capability of our algorithm
loads. with that proposed in [14], phase identification of 10 ran-
3) Now, the N readings for each of the three phase meters domly generated networks with upto 200 consumer nodes,
are determined by summation of the meter readings of was performed using both the algorithms and the success rates
consumers connected to them, respectively. are reported in Table III. The success rate is the percentage
4) The relative distances of the consumers from the trans- of networks that are exactly identified out of total number
former are assigned randomly, from a set of numbers. of networks on which the algorithms are tested. It can be
The product of these distances with respective consumer observed that our method performs better on this front as well.
readings is taken and scaled to the range (5 − 10). As
the technical losses depend on consumer loads and their B. Topology Identification
distances from the transformer, these scaled products are The proposed algorithm for Topology Identification is tested
taken as the percentages over the consumer readings by simulating data for the Bus 2 of Roy Billinton distribution
to calculate losses. The losses are added to the phase test system [30], which has 2004 nodes as per our formulation.
readings appropriately. The simulation is conducted as follows:
5) The random errors are introduced by assuming 0.5 1) The N readings for each of the consumer meters were
accuracy class meters. sampled from a uniform distribution with mean and
PAPPU et al.: IDENTIFYING TOPOLOGY OF LV DISTRIBUTION NETWORKS BASED ON SMART METER DATA 5121

TABLE IV
T OPOLOGY I DENTIFICATION : S IMULATION R ESULTS complexity is O(N 2 n) which shows that we proposed a poly-
nomial time algorithm. Hence, our algorithm is fast and scales
up well to large number of nodes.
As an illustration, we also verified that a polynomial of
degree three approximately fits the points plotted with the
number of nodes n on X-axis and simulation time t on the
Y-axis, for N = 5c. The fitted curve is presented at Fig. 8.

VI. C ONCLUSION
In this paper, we proposed a novel data-driven approach
for inferring the phase connectivity and network topology
of an LV distribution network. The proposed approach uses
PCA and its graph theoretic interpretation to infer the topol-
ogy from energy measurements. The proposed algorithms have
been corroborated by simulation of random networks and also
by simulating Roy Billinton distribution test system. Due to
the presence of losses and noise in the measurements, accurate
topology identification requires more than N = c measure-
ments per node in most of the cases. Further, the problem
can be solved in the polynomial time with time complexity
O(N 2 n), and hence, the solution can be transferred to practice
Fig. 8. Polynomial fit on no. of nodes Vs simulation time. in a straightforward manner.
In the future, we propose to use this technique for solv-
ing the problems of detecting changes in the topology, loss
estimation, and detecting non-technical losses such as power
maximum equal to the average and peak loads of the theft. We also propose to extend this approach for inferring
consumers, as mentioned in [30]. the underlying network for missing data scenario.
2) The relative distances of the consumers from their source
transformer and that of the transformers from their ACKNOWLEDGMENT
source feeders were randomly assigned.
3) The transformer and feeder meter readings, at each of The authors would like to thank Prof S. Narasimhan of IIT
the N time intervals, are then determined by appropriate Madras for their valuable inputs.
summation of consumer meter readings.
4) The noise in the samples due to technical losses, random R EFERENCES
errors and time synchronization errors are added in a [1] P. J. Dirkman, “Enhancing utility outage management system
similar way, as described in Section V-A. performance,” Schneider Electric White Paper, 2014.
[2] J. Fan and S. Borlase, “The evolution of distribution,” IEEE Power
The above simulation is repeated 10 times with different num- Energy Mag., vol. 7, no. 2, pp. 63–68, Mar./Apr. 2009.
ber of readings, N. The success rate and the average time [3] W. H. Kersting, Distribution System Modeling and Analysis, 2nd ed.
taken for the algorithm to arrive at the solution, are shown in Boca Raton, FL, USA: CRC Press, 2007.
[4] C. Lueken, P. M. Carvalho, and J. Apt, “Distribution grid reconfiguration
Table IV. reduces power losses and helps integrate renewables,” Energy Policy,
vol. 48, pp. 260–273, Sep. 2012.
[5] F. Melo et al., “Distribution automation on LV and MV using distributed
intelligence,” in Proc. IEEE 22nd Int. Conf. Exhibit. Electricity Distrib.,
C. Time Complexity of the Algorithms Stockholm, Sweden, 2013, pp. 1–4.
[6] G. Cavraro, “Modeling, control and identification of a smart grid,” Ph.D.
Time complexity of an algorithm is a formal measure of dissertation, Dept. Inf. Eng., Univ. at Padua, Padua, Italy, 2015.
how fast the algorithm runs and how well it scales up with the [7] D. K. Chembe, “Reduction of power losses using phase load balancing
increase in number of inputs. In the proposed algorithms, the method in power networks,” in Proc. World Congr. Eng. Comput. Sci.,
San Francisco, CA, USA, 2009, pp. 1–6.
inputs are the number of nodes n and the number of samples [8] D. Das, “A fuzzy multiobjective approach for network reconfigura-
per node N, which increase with the size of the network. Thus, tion of distribution systems,” IEEE Trans. Power Del., vol. 21, no. 1,
the running time of our algorithms depend on the values of n pp. 202–209, Jan. 2006.
[9] J. Huang, V. Gupta, and Y.-F. Huang, “Electric grid state estimators
and N. for distribution systems with microgrids,” in Proc. Annu. Conf. Inf. Sci.
Time complexity is generally expressed through the Syst. (CISS), Princeton, NJ, USA, 2012, pp. 1–6.
O-notation which gives an asymptotic bound on the expected [10] C. S. Chen, T. T. Ku, and C. H. Lin, “Design of phase identification
system to support three-phase loading balance of distribution feeders,”
value of the running time of the algorithm. The complexity in Proc. Ind. Commercial Power Syst. Tech. Conf. (I&CPS), Baltimore,
can be determined by finding the most time expensive step of MD, USA, 2011, pp. 1–8.
the algorithm. In both the algorithms we proposed, the step [11] Z. Shen et al., “Three-phase AC system impedance measurement
unit (IMU) using chirp signal injection,” in Proc. 28th Annu. IEEE
of PCA application on data, which involves singular valued Appl. Power Electron. Conf. Expo. (APEC), Long Beach, CA, USA,
decomposition of Z, turns out to be most expensive. Its time 2013, pp. 2666–2673.
5122 IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO. 5, SEPTEMBER 2018

[12] M. Dilek, R. P. Broadwater, and R. Sequin, “Phase prediction in distribu- Satya Jayadev Pappu received the bachelor’s
tion systems,” in Proc. IEEE Power Eng. Soc. Win. Meeting, New York, degree in electrical and electronics engineering from
NY, USA, 2002, pp. 985–990. Gayatri Vidya Parished College of Engineering. He
[13] M. Kezunovic, “Monitoring of power system topology in real-time,” in is currently pursuing the Ph.D. degree with the
Proc. 39th Hawaii Int. Conf. Syst. Sci., 2006, p. 244b. Department of Electrical Engineering, IIT Madras.
[14] V. Arya et al., “Phase identification in smart grids,” in Proc. IEEE Int. He is affiliated with the ILDS and Systems and
Conf. Smart Grid Commun., Brussels, Belgium, 2011, pp. 25–30. Control Groups, IIT Madras. His research interests
[15] V. Arya, T. S. Jayram, S. Pal, and S. Kalyanaraman, “Inferring con- include analysis, optimization and control of smart
nectivity model from meter measurements in distribution networks,” in electrical grids, and applying tools from machine
Proc. 4th Int. Conf. Future Energy Syst., Berkeley, CA, USA, 2013, learning.
pp. 173–182.
[16] H. Pezeshki and P. J. Wolfs, “Consumer phase identification in a three
phase unbalanced LV distribution network,” in Proc. IEEE PES Innov.
Smart Grid Technol. Europe, Berlin, Germany, 2012, pp. 1–7.
[17] T. A. Short, “Advanced metering for phase identification, transformer
identification, and secondary modeling,” IEEE Trans. Smart Grid, vol. 4, Nirav Bhatt received the master’s degree in chem-
no. 2, pp. 651–658, Jun. 2013. ical engineering from IIT Madras, and Docteur es
[18] M. H. F. Wen, R. Arghandeh, A. von Meier, K. Poolla, and Science (D.Sc.) degree from EPFL, Switzerland.
V. O. K. Li, “Phase identification in distribution networks with micro- He is currently INSPIRE Fellow with Systems and
synchrophasors,” in Proc. IEEE Power Energy Soc. Gen. Meeting, Control Group, IIT Madras. His research interests
Denver, CO, USA, 2015, pp. 1–5. include modeling and identification of network
[19] S. V. Wiel, R. Bent, E. Casleton, and E. Lawrence, “Identification of systems and machine learning and data analysis of
topology changes in power grids using phasor measurements,” Appl. engineering and financial systems.
Stochastic Models Bus. Ind., vol. 30, no. 6, pp. 740–752, 2014.
[20] S. Bolognani, N. Bof, D. Michelotti, R. Muraro, and L. Schenato,
“Identification of power distribution network topology via voltage cor-
relation analysis,” in Proc. 52nd IEEE Conf. Decis. Control, Florence,
Italy, 2013, pp. 1659–1664.
[21] T. Erseghe, S. Tomasin, and A. Vigato, “Topology estimation for
smart micro grids via powerline communications,” IEEE Trans. Signal
Process., vol. 61, no. 13, pp. 3368–3377, Jul. 2013. Ramkrishna Pasumarthy received the Ph.D. degree
[22] A. Rajeswaran and S. Narasimhan, “Network topology identification in systems and control from the University of
using PCA and its graph theoretic interpretations,” arXiv preprint Twente, The Netherlands, in 2006. He is cur-
arXiv:1506.00438, 2015. rently an Assistant Professor with the Department
[23] P. S. Jayadev, A. Rajeswaran, N. P. Bhatt, and R. Pasumarthy, “A novel of Electrical Engineering, IIT Madras, India. His
approach for phase identification in smart grids using graph theory and research interests lie in the area of modeling and
principal component analysis,” in Proc. Amer. Control Conf., Boston, control of physical systems, infinite dimensional
MA, USA, 2016, pp. 5026–5031. systems, and control of computing systems.
[24] I. T. Jolliffe, Principal Component Analysis, 2nd ed. New York, NY,
USA: Springer-Verlag, 2002.
[25] S. Narasimhan and S. L. Shah, “Model identification and error covari-
ance matrix estimation from noisy data using PCA,” Control Eng. Pract.,
vol. 16, no. 1, pp. 146–155, 2008.
[26] S. Narasimhan and N. P. Bhatt, “Deconstructing principal component
analysis using a data reconciliation perspective,” Comput. Chem. Eng.,
vol. 77, pp. 74–84, Jun. 2015.
Aravind Rajeswaran is currently pursuing the
[27] B. Andrásfai, Graph Theory: Flows, Matrices. Budapest, Hungary:
Ph.D. degree with the University of Washington
Akadémiai Kiadó, 1991.
Seattle, affiliated with the CSE Department. His cur-
[28] P. Andreas, K. George, and H. Ioannis, “A review of net metering mech-
rent research is on intelligent control of robots and
anism for electricity renewable energy sources,” J. Energy Environ.,
animated characters using reinforcement learning
vol. 4, no. 6, pp. 975–1002, 2013.
and trajectory optimization. He was a recipient of the
[29] American National Standard for Electricity Meters, ANSI
Best Undergraduate Thesis Award from IIT Madras
Standard c12.20-2010, pp. 1–11, 2010.
for his work on optimization and graph algorithms
[30] R. N. Allan, R. Billinton, I. Sjarief, L. Goel, and K. S. So, “A reliability
in the context of computational sustainability.
test system for educational purposes-basic distribution system data and
results,” IEEE Trans. Power Syst., vol. 6, no. 2, pp. 813–820, May 1991.

You might also like