Professional Documents
Culture Documents
Learning Graphs From Data a Signal Representation Perspective
Learning Graphs From Data a Signal Representation Perspective
Learning
Graphs
From Data
A signal representation perspective
T
he construction of a meaningful graph topology plays a graphs are used as mathematical tools to describe the struc-
crucial role in the effective representation, processing, ture of such data. They provide a flexible way of representing
analysis, and visualization of structured data. When a nat- the relationship between data entities. In the past decade,
ural choice of the graph is not readily available from the data numerous signal processing and machine-learning algorithms
sets, it is thus desirable to infer or learn a graph topology from have been introduced for analyzing structured data on a priori
the data. In this article, we survey solutions to the problem of known graphs [1]–[3]. However, there are often settings where
graph learning, including classical viewpoints from statistics the graph is not readily available, and the structure of the data
and physics, and more recent approaches that adopt a graph has to be estimated to permit the effective representation, pro-
signal processing (GSP) perspective. We further emphasize cessing, analysis, or visualization of the data. In this case, a
the conceptual similarities and differences between classical crucial task is to infer a graph topology that describes the char-
and GSP-based graph-inference methods and highlight the acteristics of the data observations, hence capturing the under-
potential advantage of the latter in a number of theoretical and lying relationship between these entities.
practical scenarios. We conclude with several open issues and Consider an example in brain signal analysis: suppose we
challenges that are keys to the design of future signal pro- are given blood-oxygen-level-dependent (BOLD) signals, i.e.,
cessing and machine-learning algorithms for learning graphs time series extracted from functional magnetic resonance
from data. imaging data that reflect the activities of different regions of
the brain. An area of significant interest in neuroscience is the
Introduction inference of functional connectivity, i.e., to capture the relation-
Modern data analysis and processing tasks typically involve ship between brain regions that correlate or synchronize given a
large sets of structured data, where the structure carries criti- certain condition of a patient, which may help reveal underpin-
cal information about the nature of the data. One can find nu- nings of some neurodegenerative diseases (see Figure 1). This
merous examples of such data sets in a wide diversity of ap- leads to the problem of inferring a graph structure, given the
plication domains, including transportation networks, social multivariate BOLD time series data.
networks, computer networks, and brain networks. Typically, Formally, the problem of graph learning is the following:
given M observations on N variables or data entities rep-
resented in a data matrix X ! R N # M, and given some prior
Digital Object Identifier 10.1109/MSP.2018.2887284
Date of publication: 26 April 2019 knowledge (e.g., distribution, data model, and so on) about
Authorized licensed use limited to: University of Oxford Libraries. Downloaded on March 26,2020 at 10:22:06 UTC from IEEE Xplore. Restrictions apply.
the data, we would like to build or infer the relationship probability distribution of these random variables. Typical
between these variables that take the form of a graph G. application domains include inferring interactions between genes
As a result, each column of the data matrix X becomes a using gene expression profiles, and the relationship between
graph signal defined on the node set of the estimated graph, politicians given their voting behaviors [6].
and the observations can be represented as X = F (G), For physically motivated models, F (G) is defined based
where F represents a certain generative process or func- on the assumption of an underlying physical phenomenon
tion on the graph. or process on the graph. One popular process is network
The graph-learning problem is an important one because diffusion or cascades [7]–[10], where F (G) dictates the
1) a graph may capture the actual geometry of structured diffusion behavior on G, which leads to the observations
data, which is essential to efficient processing, analysis, and X, possibly at different time steps. In this case, the problem
visualization; 2) learning the relationship between data enti- is equivalent to learning a graph structure on which the gen-
ties benefits numerous application domains, such as under- erative process of the observed signals may be explained.
standing functional connectivity between brain regions or Practical applications include understanding information
behavioral influence between a group of people; and 3) the flowing over a network of online media sources [7] or observ-
inferred graph can help to predict future data evolution. ing epidemics spreading over a network of human interac-
Generally speaking, inferring graph topologies from ob tions [11], given the state of exposure or infection at certain
servations is an ill-posed problem, and there are many ways time steps.
of associating a topology with the observed data samples. The fast-growing field of GSP [3], [12] offers a new per-
Some of the most straightforward methods include computing spective on the problem of graph learning. In this setting, the
the sample correlation or using a similar- columns of the observation matrix X are
ity function, e.g., a Gaussian radius basis explicitly considered as signals that are
Historically, there
function kernel to quantify the similarity defined on the vertex set of a weighted
between data samples. These methods are have been two general graph, G. The learning problem can then
based purely on observations without any approaches for learning be cast as one of learning a graph G, such
explicit prior on or model of the data, hence graphs from data: one that F (G) permits certain properties or
they may be sensitive to noise and have dif- based on statistical characteristics of the observations X to be
ficulty in tuning the hyperparameters. A models and one based explicit, e.g., smoothness with respect to
meaningful data model or accurate prior G or sparsity in a basis related to G. This
on physically motivated
may, however, guide the graph-inference signal representation perspective is partic-
process and lead to a graph topology that models. ularly interesting because it puts a strong
better reveals the intrinsic relationship among and explicit emphasis on the relationship
the data entities. Therefore, a main challenge with this prob- between the signal representation and the graph topology,
lem is to define such a model for the generative process or where F (G) often comes with an interpretation of frequency-
function F , so that it captures the relationship between the domain analysis or filtering operation of signals on the graph.
observed data X and the learned graph topology, G. Typi- For example, it is common to adopt the eigenvectors of the
cally, such models often correspond to specific criteria to graph Laplacian matrix associated with G as a surrogate for
describe or estimate structures between the data samples, e.g., the Fourier basis for signals supported on G [3], [13]; we go
models that put a smoothness assumption on the data, or that deeper into the details of this view in the “Graph Learning: A
represent an information diffusion process on the graph. Signal Representation Perspective” section.
Historically, there have been two
general approaches for learning graphs
from data: one based on statistical
models and one based on physically
motivated models. From the statisti-
cal perspective, F (G) is modeled as a
function that draws realizations from a
probability distribution over the vari-
ables, which is determined by the struc-
ture of G. One prominent example is
found in probabilistic graphical models
[5], where the graph structure encodes (a) (b)
the conditional independence relation-
ship among random variables that are
FIGURE 1. Inferring functional connectivity between different regions of the brain. (a) BOLD time series
represented by the vertices. Therefore, data recorded in different regions of the brain and (b) a functional connectivity graph where the vertices
learning the graph structure is equiva- represent the brain regions and the edges (with thicker lines indicating heavier weights) represent the
lent to learning a factorization of a joint strength of functional connections between these regions. (This figure was adapted from [4, Fig. 1].)
p ^ x ; Hh = 1 exp / i x 2 + / i x x ,
c ii i ij i j m (2)
Statistical Physically Z (H) vi ! V (v i, v j) ! E
Models Motivated Models
(Joint Distribution) (Network Diffusion) where i ij represents the ijth entry of H, and Z (H) is a normal-
ization constant.
Pairwise MRFs consist of two main classes: 1) Gaussian
graphical models, i.e., Gaussian MRFs (GMRFs), in which the
GSP Models
(Signal Representation) variables are continuous, and 2) discrete MRFs, in which the
variables are discrete. In the case of a (zero-mean) GMRF, the
joint probability can be written as
; H ;1/2
FIGURE 2. A broad categorization of different approaches to the problem p ^x ; Hh = exp ` - 1 x T H x j, (3)
of graph learning. (2r) N/2 2
"
"
Θ X ∼ N (0, Θ) Σ Σ
(a) (b) (c) (d)
FIGURE 3. (a) A ground-truth precision H, (b) an observation matrix X drawn from a multivariate Gaussian distribution with H, (c) the sample covari-
t , and (d) the inverse of the sample covariance R
ance R t.
where H is the inverse covariance or precision matrix. In this approximation leads to a Lasso regression problem [19] of
context, learning the graph structure boils down to learning the form
the matrix H, which encodes pairwise conditional indepen-
dence between the variables. It is typical to assume, or take as minimize < X 1 - X \1 b 1 <22 + m < b 1 <1, (4)
b1
a prior, that H is sparse because 1) real-world interactions are
typically local and 2) the sparsity assumption makes learning where X 1 and X \1 represent the observations on the variable
computationally more tractable. We now review some key de- x 1 (i.e., transpose of the first row of X) and the rest of the
velopments in learning Gaussian and discrete MRFs. variables, respectively, and b 1 ! R N -1 is a vector of coef-
To learn GMRFs, one of the first approaches is suggested ficients for x 1 [see Figure 4(a) and (b)]. In (4), the first term
in [18], where the author has proposed learning H by sequen- can be interpreted as a local log-likelihood of b 1, and the L1
tially pruning the smallest elements in the inverse of the sam- penalty is added to enforce its sparsity, with a regularization
ple covariance matrix R t = (1/ (M - 1)) XX T (see Figure 3). parameter m balancing the two terms. The same procedure
Although it is based on a simple and effective rule, this method is then repeated for all of the variables (or vertices). Finally,
does not perform well when the sample covariance is not a good the connection between a pair of vertices, v i and v j, is given
approximation of the “true” covariance, often due to a small by i ij = i ji = max ^; b ij ;, ; b ji ; h . This neighborhood selection
number of samples. In fact, this method cannot even be applied approach using the Lasso is intuitive with certain theoretical
when the sample size is smaller than the number of variables, guarantees; however, due to the Lasso formulation, it does not
in which case, the sample covariance matrix is not invertible. work well with observations that are not independent and
Since a graph is a representation of a pairwise relation- identically distributed (i.i.d.).
ship, it is clear that learning a graph is equivalent to learning a Instead of per-node neighborhood selection, the works in
neighborhood for each vertex, i.e., the other vertices to which it [6], [15], and [20] have proposed a popular method for estimat-
is connected. In this case, it is natural to assume that the obser- ing an inverse covariance or precision matrix at once, which
vation at a particular vertex may be represented by observa- is based on maximum-likelihood estimation. Specifically, the
tions at the neighboring vertices. Based on this assumption, the so-called graphical Lasso method aims to solve the following
authors in [14] have proposed approximating the observation problem:
maximize log det H - tr ^R
t H h - t < H <1,
at each variable as a sparse linear combination of the observa-
(5)
tions at other variables. For a variable x 1, for instance, this H
v3 v4
T X1m
X1
β13 β14
v2 T
β15 X/1 X/1m
v5
β12
1 2 3 4 5 6 7 8 9 10 m
v1
(a) (b) (c)
FIGURE 4. (a) Learning graphical models by neighborhood selection, (b) neighborhood selection via the Lasso regression for GMRFs, and (c) neighbor-
hood selection via logistic regression for discrete MRFs.
1 3
1
2.5
Magnitude of GFT Coefficient
1
0.5
v1
v4 2
v3
v5
v2 1.5
v6 0
v8 v7
1
v9
–0.5
0.5
–1 –1
0
1 2 3 4 5 6 7 8 9
–1
Eigenvalue Index (Low-to-High Frequency)
(a) (b)
1 3
1
2.5
Magnitude of GFT Coefficient
1
0.5
v1
2
v3 v4
v5
v2 1.5
v6 0
v8 v7
1
v9
–0.5
0.5
–1 –1
0
1 2 3 4 5 6 7 8 9
–1 Eigenvalue Index (Low-to-High Frequency)
(c) (d)
FIGURE 6. (a) A smooth signal on the graph with Q (L) = 1 and (b) its GFT [36] coefficients in the graph spectral domain. The signal forms a smooth
representation on the graph as its values vary slowly along the edges of the graph, and it mainly consists of low-frequency components in the graph
spectral domain. (c) A less smooth signal on the graph with Q (L) = 5 and (d) its GFT [36] coefficients in the graph spectral domain. A different choice of
the graph leads to a different representation of the same signal.
v9
(b)
FIGURE 7. Diffusion processes on the graph defined by (a) a heat diffusion kernel and (b) a graph shift operator.
k = 1 ^I - b k Sh c =
3
x = b0 P3 / a k S k c, (21) normalized Laplacian matrix. More precisely, they propose
k =0 an algorithm for characterizing and then computing a set of
admissible diffusion matrices, which defines a polytope. In
for some sets of the parameters {a} and {b} . The latter im- general, this polytope corresponds to a continuum of graphs
plies that there exists an underlying diffusion process in the that are all consistent with the observations. To obtain a par-
graph operator S, which can be the adjacency matrix, Lapla- ticular solution, an additional criterion is required. Two such
cian, or a variation thereof, that produces the signal x from the criteria are one that encourages the resulting graph to be sparse
input signal c. By assuming a finite polynomial degree K, the and another that encourages the recovered graph to be sim-
generative signal model becomes ple (i.e., a graph in which no vertex has a connection to itself,
hence an adjacency matrix with only zeros along the diagonal).
K
Similarly, in [45], after obtaining the eigenvectors of a
x = F (G) c = / a k S k c, (22)
k =0 graph shift operator, the graph-learning problem is equivalent
to learning its eigenvalues under the constraints that the shift
where the connectivity matrix of G is captured through the operator obeys some desired properties, such as sparsity. The
graph operator S. Usually, c is assumed to be a zero-mean optimization problem of [45] can be written as
graph signal with the covariance matrix R c = E [cc T ] . In ad-
dition, if c is white and R c = I, (21) is equivalent to assuming minimize f ^S, W h,
S, W
that the graph process x is stationary in S. This assumption of subject to S = |W| T , S ! S, (24)
stationarity is important for estimating the eigenvectors of the
graph operator. Indeed, since the graph operator S is often a where f ($) is a convex function applied on S that imposes the
real and symmetric matrix, its eigenvectors are also eigenvec- desired properties of S, e.g., sparsity via an entry-wise L1- norm,
tors of the covariance matrix R x . As a matter of fact, and S is the constraint set of S being a valid graph operator,
T
e.g., nonnegativity of the edge weights. The stationarity assump-
= E [xx T ] = E = / a k S k c e / a k S k c o G
K K
/x tion is further relaxed in [48]. However, all of these approaches
k =0 k =0 are based on the assumption that the sample covariance of the
T 2 observed data and the graph operator have the same set of ei-
/ ak Sk e / ak Sk o = |e / ak Kk o |T,
K K K
= (23) genvectors. Thus, their performance depends on the accuracy of
k =0 k =0 k =0
eigenvectors obtained from the sample covariance of data, which
where we have used the assumption that R c = I and the eigen- can be difficult to guarantee especially when the number of data
decomposition S = |K| T . Given a sufficient number of graph samples is small relative to the number of vertices in the graph.
signals, the eigenvectors of the graph operator S can therefore Given the limitation in estimating the eigenvectors of the
be approximated by the eigenvectors of the empirical covari- graph operator from the sample covariance, the work of [49] has
ance matrix of the observations. To recover S, the second step proposed a different approach. They have formulated the prob-
of the process would then be to learn its eigenvalues. lem of graph learning as a graph system identification prob-
The authors in [46] have followed the aforementioned rea- lem where, by assuming that the observed signals are outputs
soning and modeled the diffusion process by powers of the of a system with a graph-based filter given certain input, the
FIGURE 8. (a) A graph signal. (b)–(e) Its decomposition in four localized simple components. Each component is a heat-diffusion process (e -x L) at time
s
xs that has started from different network nodes. The size and the color of each ball indicate the value of the signal at each vertex of the graph [47].
v1 v1 v1 v1
v3 v4 v3 v4 v3 v4 v3 v4
v2 v5 v2 v2 v5 v2
v5 v5
v8 v7 v6 = v7 v6 + P1 (W) v v7 v6 + . . . + PT (W) v6
v8 8 v8 v7
v9 v9 v9
v9
FIGURE 9. A graph signal x at time step t is modeled as a linear combination of its observations in the past T time steps and a random noise process n[t ] .
6 × 10e–4 4 × 10e–2
(a)
8 0.5
Learned Graph
7 0.45
Log Magnitude of GFT Coefficient
6 0.4
0.35
5
0.3
4
0.25
3
0.2
2
0.15
1 0.1
0 0.05
–1 0
0 20 40 60 80 100 0 20 40 60 80 100
Index for Significant GFT Coefficients Eigenvalue Index
(b) (c)
FIGURE 10. Inferring a graph for image coding. (a) The graph learned on a random patch of the image Teddy using [69]. (b) A comparison between the
GFT coefficients of the image signal on the learned graph and the four nearest-neighbor grid graph. The coefficients are ordered decreasingly by log
magnitude. (c) The GFT coefficients of the graph weights.
[41] H. E. Egilmez, E. Pavez, and A. Ortega, “Graph learning from data under [66] G. Cheung, E. Magli, Y. Tanaka, and M. Ng, “Graph spectral image process-
structural and Laplacian constraints,” IEEE J. Sel. Topics Signal Process., vol. 11, ing,” Proc. IEEE, vol. 106, no. 5, pp. 907–930, 2018.
no. 6, pp. 825–841, 2017. [67] W. Hu, G. Cheung, A. Ortega, and O. C. Au, “Multi-resolution graph Fourier
[42] S. P. Chepuri, S. Liu, G. Leus, and A. O. Hero, “Learning sparse graphs under transform for compression of piecewise smooth images,” IEEE Trans. Image
smoothness prior,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Process., vol. 24, no. 1, pp. 419–433, 2015.
Processing, 2017, pp. 6508–6512. [68] I. Rotondo, G. Cheung, A. Ortega, and H. E. Egilmez, “Designing sparse
[43] F. Chung, “The heat kernel as the pagerank of a graph,” in Proc. Nat. Academy graphs via structure tensor for block transform coding of images,” in Proc.
Sci., vol. 104, no. 50, pp. 19,735–19,740, 2007. Asia-Pacific Signal and Information Processing Assoc. Annu. Summit and Conf.,
2015, pp. 571–574.
[44] H. Ma, H. Yang, M. R. Lyu, and I. King, “Mining social networks using heat
diffusion processes for marketing candidates selection,” in Proc. 17th ACM Conf. [69] G. Fracastoro, D. Thanou, and P. Frossard, Graph-based transform coding with
Information and Knowledge Management, 2008, pp. 233–242. application to image compression. 2017. [Online]. Available: arXiv:1712.06393
[45] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro, “Network topology [70] V. Kalofolias, A. Loukas, D. Thanou, and P. Frossard, “Learning time varying
inference from spectral templates,” IEEE Trans. Signal Inform. Process. Netw., vol. graphs,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, 2017,
3, no. 3, pp. 467–483, 2017. pp. 2826–2830.
[46] B. Pasdeloup, V. Gripon, G. Mercier, D. Pastor, and M. G. Rabbat, [71] K.-S. Lu and A. Ortega, “A graph Laplacian matrix learning method for fast
“Characterization and inference of graph diffusion processes from observations of implementation of graph Fourier transform,” in Proc. IEEE Int. Conf. Image
stationary signals,” IEEE Trans. Signal Inform. Process. Netw., vol. 4, no. 3, pp. Processing, 2017, pp. 1677–1681.
481–496, 2018. [72] W. Huang, T. A. W. Bolton, J. D. Medaglia, D. S. Bassett, A. Ribeiro, and D.
[47] D. Thanou, X. Dong, D. Kressner, and P. Frossard, “Learning heat diffusion V. D. Ville, A graph signal processing perspective on functional brain imaging.
graphs,” IEEE Trans. Signal Inform. Process. Netw., vol. 3, no. 3, pp. 484–499, 2017. 2018. [Online]. Available: arXiv: 1710.01135
[48] R. Shafipour, S. Segarra, A. G. Marques, and G. Mateos, Identifying the topol- [73] R. Liu, H. Nejati, and N.-M. Cheung, Joint estimation of low-rank components
ogy of undirected networks from diffused non-stationary graph signals. 2018. and connectivity graph in high-dimensional graph signals: Application to brain
[Online]. Available: arXiv:1801.03862 imaging. 2018. [Online]. Available: arXiv:1801.02303
[49] H. E. Egilmez, E. Pavez, and A. Ortega, “Graph learning from filtered signals: [74] I. Jablonski, “Graph signal processing in applications to sensor networks, smart
Graph system and diffusion kernel identification,” IEEE Trans. Signal Inform. Process. grids, and smart cities,” IEEE Sensors J., vol. 17, no. 23, pp. 7659–7666, 2017.
Netw., to be published. [75] A. Gadde, A. Anis, and A. Ortega, “Active semi-supervised learning using
[50] H. P. Maretic, D. Thanou, and P. Frossard, “Graph learning under sparsity sampling theory for graph signals,” in Proc. ACM SIGKDD Int. Conf. Knowledge
priors,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, 2017, Discovery and Data Mining, 2014, pp. 492–501.
pp. 6523–6527. [76] S. Vlaski, H. Maretic, R. Nassif, P. Frossard, and A. H. Sayed, “Online graph
[51] R. Rubinstein, A. M. Bruckstein, and M. Elad, “Dictionaries for sparse repre- learning from sequential data,” in Proc. IEEE Data Science Workshop, 2018, pp.
sentation modeling,” Proc. IEEE, vol. 98, no. 6, pp. 1045–1057, 2010. 190–194.
[52] I. Tosic and P. Frossard, “Dictionary learning,” IEEE Signal Process. Mag., [77] H. N. Mhaskar, “A unified framework for harmonic analysis of functions on
vol. 28, no. 2, pp. 27–38, 2011. directed graphs and changing data,” Appl. Comput. Harmon. Anal., vol. 44, no. 3,
pp. 611–644, 2018.
[53] K. J. Friston, “Functional and effective connectivity in neuroimaging: A synthe-
sis,” Human Brain Mapping, vol. 2, no. 1–2, pp. 56–78, 1994. [78] B. Girault, A. Ortega, and S. Narayanan, “Irregularity-aware graph Fourier
transforms,” IEEE Trans. Signal Process., vol. 66, no. 21, pp. 5746–5761, 2018.
[54] Y. Shen, B. Baingana, and G. B. Giannakis, Nonlinear structural vector autore-
gressive models for inferring effective brain network connectivity. 2016. [Online]. [79] S. Sardellitti, S. Barbarossa, and P. D. Lorenzo, “On the graph Fourier trans-
Available: arXiv:1610.06551 form for directed graphs,” IEEE J. Sel. Topics Signal Process., vol. 11, no. 6, pp.
796–811, 2017.
[55] J. Mei and J. M. F. Moura, “Signal processing on graphs: Causal modeling of
unstructured data,” IEEE Trans. Signal Process., vol. 65, no. 8, pp. 2077–2092, [80] R. Shafipour, A. Khodabakhsh, G. Mateos, and E. Nikolova, A directed graph
2017. Fourier transform with spread frequency components. 2018. [Online]. Available:
arXiv:1804.03000
[56] J. Songsiri and L. Vandenberghe, “Topology selection in graphical models of
autoregressive processes,” J. Mach. Learn. Res., vol. 11, pp. 2671–2705, 2010. [81] A. D. Col, P. Valdivia, F. Petronetto, F. Dias, C. T. Silva, and L. G. Nonato,
[Online]. Available: https://dl.acm.org/citation.cfm?id=1953020 “Wavelet-based visual analysis of dynamic networks,” IEEE Trans. Vis. Comput.
Graph., vol. 24, no. 8, pp. 2456–2469, 2018.
[57] A. Bolstad, B. D. V. Veen, and R. Nowak, “Causal network inference via group
sparse regularization,” IEEE Trans. Signal Process., vol. 59, no. 6, pp. 2628–2641, [82] E. Pavez, H. E. Egilmez, and A. Ortega, “Learning graphs with monotone
2011. topology properties and multiple connected components,” IEEE Trans. Signal
Process., vol. 66, no. 9, pp. 2399–2413, 2018.
[58] A. Roebroeck, E. Formisano, and R. Goebel, “Mapping directed influence over
the brain using Granger causality and fMRI,” NeuroImage, vol. 25, no. 1, pp. 230– [83] M. Sundin, A. Venkitaraman, M. Jansson, and S. Chatterjee, “A connected-
242, 2005. ness constraint for learning sparse graphs,” in Proc. European Signal Processing
Conf., 2017, 151–155.
[59] R. Goebel, A. Roebroeck, D.-S. Kim, and E. Formisano, “Investigating direct-
ed cortical interactions in time-resolved fMRI data using vector autoregressive mod- [84] X. Dong, P. Frossard, P. Vandergheynst, and N. Nefedov, “Clustering on multi-
eling and Granger causality mapping,” Magn. Resonance Imag., vol. 21, no. 10, pp. layer graphs via subspace analysis on Grassmann manifolds,” IEEE Trans. Signal
1251–1261, 2003. Process., vol. 62, no. 4, pp. 905–918, 2014.
[60] D. W. Kaplan, Structural Equation Modeling: Foundations and Extensions, [85] S. Segarra, G. Mateos, A. G. Marques, and A. Ribeiro, “Blind identification of
2nd ed. Newbury Park, CA: Sage, 2009. graph filters,” IEEE Trans. Signal Process., vol. 65, no. 5, pp. 1146–1159, 2017.
[61] A. McIntosh and F. Gonzalez-Lima, “Structural equation modeling and its [86] S. Sardellitti, S. Barbarossa, and P. Di Lorenzo, “Graph topology inference
application to network analysis in functional brain imaging,” Hum. Brain Mapp., based on transform learning,” in Proc. IEEE Global Conf. Signal and Information
vol. 2, no. 1–2, pp. 2–22, 1994. Processing, 2016, pp. 356–360.
[62] B. Baingana and G. B. Giannakis, “Tracking switched dynamic network topol- [87] Y. Yankelevsky and M. Elad, “Dual graph regularized dictionary learning,”
ogies from information cascades,” IEEE Trans. Signal Process., vol. 65, no. 4, pp. IEEE Trans. Signal Inform. Process. Netw., vol. 2, no. 4, pp. 611–624, 2016.
985–997, 2017. [88] D. Hallac, Y. Park, S. Boyd, and J. Leskovec, “Network inference via the
[63] P. A. Traganitis, Y. Shen, and G. B. Giannakis, “Network topology inference time-varying graphical Lasso,” in Proc. ACM SIGKDD Int. Conf. Knowledge
via elastic net structural equation models,” in Proc. European Signal Processing Discovery and Data Mining, 2017, pp. 205–213.
Conf., 2017, 146–150. SP