Professional Documents
Culture Documents
PDF Research in Data Science Ellen Gasparovic Ebook Full Chapter
PDF Research in Data Science Ellen Gasparovic Ebook Full Chapter
Gasparovic
Visit to download the full and correct content document:
https://textbookfull.com/product/research-in-data-science-ellen-gasparovic/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://textbookfull.com/product/analytics-and-data-science-
advances-in-research-and-pedagogy-1st-edition-amit-v-deokar/
https://textbookfull.com/product/data-science-and-social-
research-ii-methods-technologies-and-applications-paolo-mariani/
https://textbookfull.com/product/frontiers-in-data-science-
matthias-dehmer/
https://textbookfull.com/product/ethics-in-science-ethical-
misconduct-in-scientific-research-dangelo/
Social Science Research Ethics in Africa Nico Nortjé
https://textbookfull.com/product/social-science-research-ethics-
in-africa-nico-nortje/
https://textbookfull.com/product/real-time-stream-data-
management-push-based-data-in-research-practice-wolfram-
wingerath/
https://textbookfull.com/product/frontiers-in-data-science-1st-
edition-matthias-dehmer/
https://textbookfull.com/product/uncertainty-modelling-in-data-
science-sebastien-destercke-ed/
https://textbookfull.com/product/advances-in-panel-data-analysis-
in-applied-economic-research-nicholas-tsounis/
Association for Women in Mathematics Series
Ellen Gasparovic
Carlotta Domeniconi Editors
Research
in Data
Science
Association for Women in Mathematics Series
Volume 17
Series Editor
Kristin Lauter
Microsoft Research
Redmond,Washington, USA
Association for Women in Mathematics Series
123
Editors
Ellen Gasparovic Carlotta Domeniconi
Department of Mathematics Department of Computer Science
Union College George Mason University
Schenectady, NY, USA Fairfax, VA, USA
Mathematics Subject Classification (2010): 62-07, 68P05, 68T05, 68P20, 62H30, 91C20, 68U10, 65D18,
62H35, 68W27, 68U05, 05E45, 55Q07
This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The first Women in Data Science and Mathematics (WiSDM) Research Collabora-
tion Workshop was held on July 17–21, 2017, at the Institute for Computational and
Experimental Research in Mathematics (ICERM) in Providence, Rhode Island. In
addition to generous support from ICERM, the workshop was partially supported
by the Association for Women in Mathematics (AWM) ADVANCE grant funded by
the National Science Foundation. Additional support for some participant travel was
provided by the Center for Discrete Mathematics and Theoretical Computer Science
(DIMACS) in association with its Special Focus on Information Sharing and
Dynamic Data Analysis. The workshop was co-sponsored by Brown University’s
Data Science Initiative.
The 52 participants of the workshop included women from mathematics (theo-
retical and applied), statistics, computer science, and electrical and computer engi-
neering. They represented women at all career stages and in diverse career paths,
including faculty, postdoctoral fellows, graduate students, industrial scientists, and
government scientists. Based on their research interests and backgrounds, each
participant was assigned to one of six working groups headed by leading researchers
in the field of data science. In addition to intense research time, the week’s schedule
featured introductory talks from the group leaders, a panel discussion, an invited
lecture from Brown University biologist Sohini Ramachandran, and presentations
of the week’s work from each of the working groups.
This single-blind peer-reviewed volume is the proceedings of the first WiSDM
workshop and features accepted submissions from several of the working groups as
well as additional papers solicited from the wider community. Topics range from
the more theoretical to the more applied and computational.
v
vi Preface
Project Descriptions
Led by Julie Mitchell, the goal of the first group’s project was to build and
optimize predictive models for molecular data using a range of machine learning
and informatics techniques applied to data generated from past molecular modeling
projects. Prior models had been built with around 100 experimental data points,
while other bimolecular models utilized over 50,000 data points. Hence, participants
focused on the applicability of various machine learning methods to data sets of
different sizes.
Under the guidance of Linda Ness, the second group considered the rep-
resentation of data as multi-scale features and measures. Recently, multi-scale
representation theorems from harmonic analysis and geometric measure theory have
been exploited to compute canonical multi-scale representations of data samples
for supervised and unsupervised learning, statistical fusion and construction of
confidence measures, and data visualization. The goal of this research collaboration
was threefold: to assess the applicability of multi-scale representation approaches
to various types of data, to introduce the approach to statistical researchers who
may be interested in statistical fusion and confidence measures, and to develop and
apply new multi-scale methods for representation of data as measures characterizing
mathematical properties of the data.
The members of the third group, led by Giseon Heo, looked at inferential models
founded in statistical and topological learning applied to pediatric obstructive sleep
apnea (OSA) data sets. OSA is a form of sleep-disordered breathing characterized
by recurrent episodes of partial or complete airway obstruction during sleep and
is prevalent in 1–5% of school-aged children. Chronic diseases such as OSA
are multifactorial disorders, necessitating different types of data to capture the
complex system. The team focused on analyses for time series signals from
polysomnography, survey questionnaires, and upper airway shapes. The participants
sought to develop a statistical and topological learning model that could accurately
predict OSA severity.
The goal of the fourth group’s project, under the direction of Deanna Needell,
was to apply stochastic signal processing for high-dimensional data. One mathe-
Preface vii
matical method that has gained a lot of recent attention in the ever-evolving field
of data science is the use of sparsity and stochastic designs. Sparsity captures the
idea that high-dimensional signals often contain a very small amount of intrinsic
information. Often, through randomized designs, signals can be captured using a
very small number of measurements. On the recovery side, stochastic methods can
accurately estimate signals from those measurements in the underdetermined set-
ting, as well as solve large-scale systems in the highly overdetermined setting. Par-
ticipants selected applications of interest, designed stochastic algorithms for those
frameworks, and ran experiments on synthetic data from those application areas.
The fifth working group, led by Carlotta Domeniconi, studied the hubness
phenomenon in high-dimensional spaces. Although data can easily contain tens of
thousands of features, data often have an intrinsic dimensionality that is embedded
within the full-dimensional space. Hubness causes certain data examples to appear
more often than others as neighbors of points, thus generating a skewed distribution
of nearest neighbor counts. The participants investigated the relationship between
the hubness phenomenon and the intrinsic dimensionality of data, with the ultimate
goal of recovering the subspaces data lie within. The findings could enable effective
subspace clustering of data, as well as outlier identification.
Finally, under the guidance of Emina Soljanin, the participants in the sixth
working group focused on codes for data storage with queues for data access. Large
volumes of data, which are being collected for the purpose of knowledge extraction,
have to be reliably, efficiently, and securely stored. Retrieval of large data files from
storage has to be fast (and often anonymous and private). This project was concerned
with big data storage and access, and its relevant mathematical disciplines include
algebraic coding and queueing theory. The members of the group looked at coding
and queueing problems in the era of big data, two interwoven and indispensable
aspects of big data storage and access.
Contributed Papers
In the first paper by Durgin et al., the authors develop a stochastic approach, based
on the sparse randomized Kaczmarz (SRK) algorithm, to perform support recovery
of corrupted and jointly sparse multiple measurement vectors (MMV). In the MMV
setting, one has access to multiple vectors, or signals, that are assumed to share a
support set. However, the measurement vectors are corrupted, meaning that each
measurement vector may have additional non-zeros that are not truly part of the
shared support the authors aim to recover. The authors also adapt the SRK algorithm
to the online setting where the measurements are streaming continuously.
Mani et al. address the challenges of learning with high-dimensional data,
focusing on the hubness phenomenon. The authors identify and discuss new geo-
metric relationships between hubness, data density, and data distance distribution.
The findings shed light on the role of hubness in the discovery of the intrinsic
dimensionality of data and thus in the design of effective methods to recover the
viii Preface
The first Women in Data Science and Mathematics workshop was a great success
thanks in large part to generous funding and support from ICERM, the National
Science Foundation (NSF-HRD 1500481) and the AWM (“Career Advancement for
Women Through Research-Focused Networks”), DIMACS, and Brown University’s
Data Science Initiative. The AWM and NSF have provided funds for a follow-
up research minisymposium during the 2019 SIAM Conference on Computational
Science and Engineering. The organizers of WiSDM, the editors, and the authors
in this volume are all tremendously grateful for the support for these unique
opportunities for collaboration and dissemination.
The editors would like to thank the chapter authors as well as the reviewers
who gave valuable feedback and suggestions to the authors. We would also like
to heartily thank the AWM and Springer for the opportunity to create this volume.
We look forward to many more WiSDM research collaboration workshops in the
future and to continuing to build the WiSDM research network.
xi
Contents
xiii
xiv Contents
Natalie Durgin, Rachel Grotheer, Chenxi Huang, Shuang Li, Anna Ma,
Deanna Needell, and Jing Qin
Abstract While single measurement vector (SMV) models have been widely
studied in signal processing, there is a surging interest in addressing the multiple
measurement vectors (MMV) problem. In the MMV setting, more than one
measurement vector is available and the multiple signals to be recovered share
some commonalities such as a common support. Applications in which MMV is
a naturally occurring phenomenon include online streaming, medical imaging, and
video recovery. This work presents a stochastic iterative algorithm for the support
recovery of jointly sparse corrupted MMV. We present a variant of the sparse
randomized Kaczmarz algorithm for corrupted MMV and compare our proposed
method with an existing Kaczmarz type algorithm for MMV problems. We also
N. Durgin
Spiceworks, Austin, TX, USA
R. Grotheer ()
Goucher College, Baltimore, MD, USA
e-mail: rachel.grotheer@goucher.edu
C. Huang
Yale University, New Haven, CT, USA
S. Li
Colorado School of Mines, Golden, CO, USA
A. Ma
Claremont Graduate University, Claremont, CA, USA
e-mail: anna.ma@cgu.edu
D. Needell
University of California, Los Angeles, CA, USA
J. Qin
Montana State University, Bozeman, MT, USA
showcase the usefulness of our approach in the online (streaming) setting and
provide empirical evidence that suggests the robustness of the proposed method
to the number of corruptions and the distribution from which the corruptions are
drawn.
1 Introduction
In recent years, there has been a drastic increase in the amount of available data.
This so-called “data deluge” has created a demand for fast, iterative algorithms that
can be used to process large-scale data. Stochastic iterative algorithms, such as the
randomized Kaczmarz or stochastic gradient descent algorithms, have become an
increasingly popular option for processing large-scale data [3, 10]. These methods
recover a signal X ∈ Rn given a vector of measurements Y ∈ Rm and a
measurement matrix Φ ∈ Rm×n such that:
Y = ΦX, (1)
without accessing the full measurement matrix in a single iteration. We refer to (1)
as a single measurement vector (SMV) model. In the multiple measurement vector
(MMV) setting, one may have thousands of measurement vectors Y (·,j ) pouring in
over time. Each measurement vector corresponds to a signal X(·,j ) , where signals
typically share a common property such as sparsity, smoothness, etc. For simplicity,
let Y = [Y (·,1) · · · Y (·,J ) ] ∈ Rm×J and X = [X(·,1) · · · X(·,J ) ] ∈ Rn×J . Since high-
dimensional data is typically sparse in nature, a commonality of particular interest
is joint sparsity, or when signals share the same support. The support of a vector v is
defined to be the set of indices for which v is nonzero, i.e., supp(v) = {i : vi = 0}.
Many algorithms have been developed for the MMV setting, especially in
applications such as line spectral estimation [13, 20] and modal analysis [14].
The authors in these works extend the previous SMV-based algorithms as well as
theoretical analysis in [4, 8, 19] to the MMV case. The theoretical bound in [14] also
indicates that MMV settings could make the problem of compressed signal recovery
much easier than in the SMV setting. In particular, the number of measurements
needed for perfect recovery in each signal decreases as the number of signals
increases, reducing the sample complexity per signal.
As a motivating example, consider diffuse optical tomography (DOT) where
the goal is to find small areas of high contrast corresponding to the location of
cancerous cells [2]. Since cancerous cells have a much larger absorption coefficient
than healthy cells, a two-dimensional medical image can be interpreted as a sparse
signal where each entry of the signal represents the absorption coefficient of a given
pixel and the nonzero entries correspond to tumor locations. In a hyperspectral DOT
setting, hundreds of different wavelengths are used to acquire a variety of images
of the same tissue, allowing practitioners to obtain a more accurate location of
tumors [11]. The hyperspectral imaging process results in a jointly sparse MMV,
Sparse Randomized Kaczmarz for Support Recovery of Jointly Sparse. . . 3
where each wavelength produces a different image (or signal), and the joint support
across all images indicates the locations of cancerous cells.
Signals may share a common support but it is improbable for them to be
perfectly accurate. Since sensing mechanisms are not impervious to error, signals
can contain corruptions. Other sources of corruption in signal processing include
spikes in power supply, defective hardware, and adversarial agents [12]. Going
back to the hyperspectral imaging example, “corruptions” in each signal may be
caused by noncancerous cells that absorb more light at a given wavelength than
their neighbors. For example, if a cell contains an anomalous amount of melanin,
then it absorbs more light at shorter wavelengths in the visible spectrum (i.e., violet
or blue light) compared to a typical noncancerous cell [5, 15]. This produces a large
nonzero absorption coefficient in the location of a healthy cell, i.e., a corruption.
Corrupt entries erroneously indicate the presence of cancerous cells in a location
with healthy cells.
Corruptions cause support recovery algorithms such as the MMV sparse random-
ized Kaczmarz (MMV-SRK) algorithm, which we describe in detail in Sect. 2, to fail
due to the algorithmic dependence on the row norms of the signal approximation
to estimate the support [1]. Thus, large corruptions in a signal with comparatively
small entries may erroneously be included in the support estimate given by these
algorithms. In the corrupt MMV setting, the availability of multiple measurement
vectors becomes vital to the estimate of the true support. Clearly, if only a single
measurement vector is available, there would be no way to distinguish a corrupt
nonzero entry without any additional assumptions on the signal or corruption.
Corrupt measurement signals have been studied in the context of the SMV model.
In [18] and [12], additive noise in the measurement scheme is assumed to be sparse.
Both works focus on the compressive sensing setting where m n.
The primary objective of this work is to design an algorithm for recovering the
support of jointly sparse, corrupt signals in the large-scale setting. We propose
a new online algorithm called sparse randomized Kaczmarz for corrupted MMV
(cMMV-SRK) for support recovery. Note that the proposed algorithm can recover
the signals with high accuracy based on our experiments, but we mainly focus on
support recovery in this work. Our experiments show that the proposed algorithm
outperforms the previously proposed Kaczmarz type algorithm in recovering the
joint support from MMV when the signals are corrupted.
with X(·,j ) ∈ Rn . We assume that the data is large-scale, meaning we cannot access
all of Φ at once (m and/or n is too large) and must only operate on one row of Φ
at a time. We allow the system to be overdetermined (m n) or underdetermined
(m n) and assume the X(·,j ) ’s are jointly sparse such that supp(X(·,j ) ) = S and
|S | = k. For an n-dimensional vector X(·,j ) , let X(·,j ) |s return X(·,j ) with zeros in
the n − s smallest (in magnitude) entries. We also assume that each column of X
contains one or more corruptions where a corruption is a nonzero entry occurring
outside the joint support. In other words, instead of supp(X(·,j ) ) ⊂ S , the joint
support set, the support of X(·,j ) is
where Cj is the “corrupt index set.” Note that the Cj are not necessarily the same
for every j . In this work, our goal is to recover the joint support S from the given
linear measurements Y .
The remainder of this manuscript is organized in the following way. Section 2
discusses the sparse randomized Kaczmarz method and the MMV-SRK algorithm.
Section 3 provides a discussion on how corruptions can negatively impact the
performance of MMV-SRK. Section 3 also presents our method, cMMV-SRK, a
variant of SRK which works in the corrupted signal setting. Numerical experiments
using this method are presented in Sect. 4 and we conclude with a summary of our
contributions and future directions in Sect. 5.
c
S t is the complement set of S t
7: Set a = w · Φ (i,·) a ∈ Rn is the weighted row of Φ
Y i −aXt−1 T
8: Update X t = Xt−1 + a
a 22
9: end for
10: return X τ
11: end procedure
size k. In this variant, the algorithm runs for a specified number of iterations (up to
τ ). However, any stopping criteria one would use for an iterative algorithm, e.g.,
terminating after the residual meets a certain criteria, after the updates become
small, etc., can be used. Algorithm 1 also differs from the original presented
algorithm by Mansour and Yilmaz [16] in that at every iteration the support estimate
has size k̂ instead of starting with n and shrinking the size to k̂. We find that these
modifications do not significantly affect the behavior of SRK.
Algorithm 1 has been shown empirically to find the solution to overdetermined,
consistent (i.e., a solution exists) linear systems but there are no theoretical results
supporting this. One can make a few observations about the behavior of SRK
for support recovery. Concerning the support size estimate k̂, it is clear that if
k̂ < k then the probability that the true support is contained in the support of the
approximation is 0, i.e., P (S ⊂ supp(Xτ )) = 0. Additionally, if k̂ = n, then
P (S ⊂ supp(Xτ )) = 1. In regards to the choice of weighting, as t → ∞, √1t → 0
so that row elements inside the support estimate contribute to the approximation the
most. If one has a weighting function that decreases too rapidly, the true support
may not be captured in S t causing the algorithm to fail.
Although Algorithm 1 and the following algorithms require the Frobenius norm
of the matrix, Φ 2F , for row selection, practically speaking row selections can be
done uniformly at random to avoid using the full measurement matrix in a single
iteration. Indeed, it is advantageous to select the rows at random to avoid introducing
bias from rows with larger norms.
6 N. Durgin et al.
c
S t is the complement set of S t
7: Set a = w · Φ (i,·) a is the weighted row of Φ
8: for j = 1, . . . , J do
Y (i,j ) −aXt−1
9: Update X t(·,j ) = Xt−1
(·,j ) +
(·,j )
aT
a 2
2
10: end for
11: Update Xt = [X t(·,1) | . . . |X t(·,J ) ]
12: end for
13: return X t
14: end procedure
Sparse Randomized Kaczmarz for Support Recovery of Jointly Sparse. . . 7
3 Main Results
4 Experiments
Fig. 1 Comparing cMMV-SRK and MMV-SRK for support recovery when there is a single
corrupt entry per signal whose magnitude is drawn from N (7, 1). (a) Φ ∼ N (0, 1). (b)
Φ ∼ U nif ([0, 1])
Fig. 2 Comparing cMMV-SRK and MMV-SRK for support recovery when there is a single
corrupt entry per signal whose magnitude is drawn from N (0, 1). (a) Φ ∼ N (0, 1). (b)
Φ ∼ U nif ([0, 1])
Fig. 3 Investigating the robustness of cMMV-SRK and MMV-SRK when a random number of
(multiple) corruptions are introduced. Here, a signal can have between 1 and 3 corruptions. The
corrupts magnitude of the corruptions are drawn from N (7, 1). (a) X matrix with 1–3 corruptions
per signal. (b) Performance of Algorithm 3 with multiple corruptions
In the next two experiments, we test the robustness of our proposed algorithm
against multiple corruptions. In Fig. 3, we allow for each signal to have multiple
corruptions. For each signal, i.e., column of X, we uniformly at random select an
integer from 1 to 3 to be the number of corruptions. The value of the corruptions
is drawn i.i.d. from N (7, 1) and an example of the resulting matrix can be seen in
Fig. 3a. The performance of the methods can be seen in Fig. 3b. We note that the
results of this experiment are very similar to those of the experiment in Fig. 1 since
the corruptions are drawn from the same distribution. Again, due to the use of row
norms, in the presence of multiple corruptions Algorithm 2 gives a less accurate
joint support estimate, recovering no more than about 15% of the support.
Sparse Randomized Kaczmarz for Support Recovery of Jointly Sparse. . . 11
Fig. 4 Investigating the robustness of cMMV-SRK in a simulated online setting with a random
number of (multiple) corruptions. Here, a signal can have between 1 and 3 corruptions whose
magnitudes are drawn from N (7, 1) and we consider the over- and underdetermined linear system
settings. (a) Overdetermined linear system (m > n). (b) Underdetermined linear system (m < n)
5 Conclusion
Acknowledgements The initial research for this effort was conducted at the Research Collabora-
tion Workshop for Women in Data Science and Mathematics, July 17–21, held at ICERM. Funding
for the workshop was provided by ICERM, AWM, and DIMACS (NSF Grant No. CCF-1144502).
SL was supported by NSF CAREER Grant No. CCF−1149225. DN was partially supported by
the Alfred P. Sloan Foundation, NSF CAREER #1348721, and NSF BIGDATA #1740325. JQ was
supported by NSF DMS-1818374.
References
11. F. Larusson, S. Fantini, E.L. Miller, Hyperspectral image reconstruction for diffuse optical
tomography. Biomed. Opt. Express 2(4), 946–965 (2011)
12. J.N. Laska, M.A. Davenport, R.G. Baraniuk, Exact signal recovery from sparsely corrupted
measurements through the pursuit of justice, in Asilomar Conference on Signals, Systems, and
Computers (IEEE, Piscataway, 2009), pp. 1556–1560
13. Y. Li, Y. Chi, Off-the-grid line spectrum denoising and estimation with multiple measurement
vectors. IEEE Trans. Signal Process. 64(5), 1257–1269 (2016)
14. S. Li, D. Yang, G. Tang, M.B. Wakin, Atomic norm minimization for modal analysis from
random and compressed samples. IEEE Trans. Signal Process. 66(7), 1817–1831 (2018)
15. G. Lu, B. Fei, Medical hyperspectral imaging: a review. J. Biomed. Opt. 19(1), 010901 (2014)
16. H. Mansour, O. Yilmaz, A fast randomized Kaczmarz algorithm for sparse solutions of
consistent linear systems (2013). arXiv preprint arXiv:1305.3803
17. T. Strohmer, R. Vershynin, Comments on the randomized Kaczmarz method. J. Fourier Anal.
Appl. 15(4), 437–440 (2009)
18. C. Studer, P. Kuppinger, G. Pope, H. Bolcskei, Recovery of sparsely corrupted signals. IEEE
Trans. Inf. Theory 58(5), 3115–3130 (2012)
19. G. Tang, B.N. Bhaskar, P. Shah, B. Recht, Compressed sensing off the grid. IEEE Trans. Inf.
Theory 59(11), 7465–7490 (2013)
20. Z. Yang, L. Xie, Exact joint sparse frequency recovery via optimization methods. IEEE Trans.
Signal Process. 64(19), 5145–5157 (2014)
The Hubness Phenomenon
in High-Dimensional Spaces
1 Introduction
One of the key disciplines contributing to data science is machine learning, which
seeks to discover meaningful patterns or structure within the data. Key machine
learning paradigms are supervised and unsupervised learning. The first makes use of
data labels; the second does not. Most data in real-life scenarios do not have labels,
either because labels are unknown or are too costly to obtain. As such, unsupervised
learning plays an important role in data-driven research. In particular, a fundamental
unsupervised learning problem, extremely common in exploratory data mining, is
clustering. The goal in clustering is to discover groups data based on a notion of
similarity.
An issue related to clustering is the so-called curse of dimensionality [2].
Data with thousands of dimensions abound in fields and applications as diverse
as bioinformatics, security and intrusion detection, and information and image
retrieval. Clustering algorithms can handle data with low dimensionality, but as
the dimensionality of the data increases, these algorithms tend to break down.
This is because in high-dimensional spaces data become extremely sparse and
their distances become indistinguishable, a phenomenon also known as distance
concentration. As a consequence reliable density estimation cannot be performed,
and this affects negatively any learning algorithms that compute distances (or
similarities) in the full-dimensional input space (also known as the embedding
space).
A common scenario with high-dimensional data is that several clusters may exist
in different subspaces comprised of different combinations of features. In many real-
world problems, points in a given region of the input space may cluster along a
given set of dimensions, while points located in another region may form a tight
group with respect to different dimensions. Each dimension could be relevant to at
least one of the clusters. Common global dimensionality reduction techniques are
unable to capture such local structure of the data. Thus, a proper feature selection
procedure should operate locally in the input space. Local feature selection allows
one to estimate to which degree features participate in the discovery of clusters. As
a result, many different subspace clustering methods have been proposed [10, 14,
17, 18].
Another aspect of high-dimensional spaces which has recently come to light
is the phenomenon of hubness. It is observed that the distribution of neighbor
occurrences becomes skewed in intrinsically high-dimensional data. This means that
there are few data points with a high neighbor occurrence count, which emerge as
hubs. Though hubs and related power laws have been observed in other contexts
such as the Internet and protein–protein interactions, the phenomenon referred to
here is the concept of skewness of degree distributions in the k-nearest neighbor
topologies of high-dimensional data. More details on this phenomenon are given
in Sect. 2. Recent studies [19] have shown that hubness is an inherent property of
intrinsically high-dimensional data and could potentially be leveraged to improve
Another random document with
no related content on Scribd:
Beethoven entitled the next movement ‘a devout song of praise,
offered by a convalescent to God, in the Lydian mode.’ It probably
owes its origin to the fact that Beethoven was taken seriously ill while
at work on this and the B-flat major quartet. It seems likely that
before this illness he had other plans for the quartet, and that the
Danza tedesca before mentioned was to find a place in it.
The short march which follows calls for no comment. The final
allegro is introduced by recitative passages for the first violin, gaining
in passion, culminating in a dramatic run over the diminished
seventh chord which bears some resemblance to the opening of the
allegro of the first movement. There is a passing sigh before the last
movement begins, Allegro appassionato.
The theme itself is in the form of a dialogue between first and second
violins. It merges into the first variation without perceptible break in
the music. Here the theme is carried by the second violin, the first
filling the pauses with a descending figure. This clause of the theme
is then repeated by the viola, the 'cello taking the rôle of the first
violin. The second clause of the theme is similarly treated.
The remaining six variations are clearly set apart from each other by
changes in the time signature. There is a variation marked piu
mosso, really alla breve, which is a dialogue between first violin and
'cello, accompanied at first monotonously by the other two
instruments, later with more variety and animation. The next is an
andante moderato e lusinghiero, in which the theme is arranged as a
canon at the second, first between the two lower instruments, later
between the two higher. This leads to an adagio in 6/8 time, in which
the theme is broken up into passage work. The next and fifth
variation (allegretto, 2/4) is the most hidden of all. The notes of the
theme are separated and scattered here and there among the four
parts. But the sixth, an adagio in 9/4 time, is simpler. The seventh,
and last, is a sort of epilogue, a series of different statements of the
theme, at first hidden in triplet runs; then emerging after a long trill, in
its simplest form, in the key of C major; then in A major with an
elaborated accompaniment; in F major, simple again; and finally
brilliantly in A major.
The following Presto in E major, alla breve, is very long, but is none
the less symmetrical and regular in structure. It is in effect a scherzo
and trio. The scherzo is in the conventional two sections, both of
which are built upon the same subject. The second section is broken
by four measures (molto poco adagio!); and there is a false start of
the theme, following these, in G-sharp minor, suddenly broken by a
hold. This recalls the effect of the very opening of the movement, a
single measure, forte, by the 'cello, as if the instrument were starting
off boldly with the principal subject. But a full measure of silence
follows, giving the impression that the 'cello had been too precipitate.
The Trio section offers at first no change of key; but a new theme is
brought forward. Later the key changes to A major, and the rhythm is
broadened. A series of isolated pizzicato notes in the various
instruments prepares the return of the Scherzo (without repeats).
The Trio follows again; and there is a coda, growing more rapid, after
the Scherzo has been repeated for the second time.
After the C-sharp minor quartet, the last quartet—in F major, opus
135—appears outwardly simple. It shares with the first of the series
simplicity and regularity of form; and is, like the quartet in E-flat
major, calm and outspoken, rather than disturbed, gloomy, or
mysterious. It is the shortest of all the last quartets.
The first movement is in perfect sonata form. The first theme (viola)
has a gently questioning sound, which one may imagine mocked by
the first violin. The second theme, in C major, is light, almost in the
manner of Haydn. The movement builds itself logically out of the
opposition of these two motives, the one a little touched with
sadness and doubt, the other confidently gay. The Scherzo which
follows needs no analysis. Two themes, not very different in
character, are at the basis. The second is presented successively in
F, G, and A, climbing thus ever higher. The climax at which it arrives
is noteworthy. The first violin is almost acrobatic in the expression of
wild humor, over an accompaniment which for fifty measures
consists of the unvaried repetition of a single figure by the other
three instruments in unison. Following this fantastical scherzo there
is a short slow movement in D-flat major full of profound but not
tragic sentiment. The short theme, flowing and restrained, undergoes
four variations; the second in C-sharp minor, rather agitated in
character; the third in the tonic key, giving the melody to the 'cello;
and the fourth disguising the theme in short phrases (first violin). To
the last movement Beethoven gave the title, Der schwer gefasste
Entschluss. Two motives which occur in it are considered, the one as
a question: Muss es sein? the other as the answer: Es muss sein.
The former is heard only in the introduction, and in the measures
before the third section of the movement. The latter is the chief
theme. Whether or not these phrases are related to external
circumstances in Beethoven’s life, the proper interpretation of them
is essentially psychological. The question represents doubt and
distrust of self. The answer to such misgivings is one of deeds, not
words, of strong-willed determination and vigorous action. Of such
the final movement of the last quartet is expressive. Such seems the
decision which Beethoven put into terms of music.
FOOTNOTES:
[70] The famous Schuppanzigh quartet met every Friday morning at the house of
Prince Lichnowsky. Ignaz Schuppanzigh (b. 1776) was leader. Lichnowsky himself
frequently played the second violin. Franz Weiss (b. 1788), the youngest member,
hardly more than a boy, played the viola. Later he became the most famous of the
viola players in Vienna. The 'cellist was Nikolaus Kraft (born 1778).
[71] Förster (1748-1823) forms an important link between Haydn and Beethoven.
[74] Only Schuppanzigh himself, and Weiss, the violist, remained of the original
four who first played Beethoven’s quartets opus 18 at the palace of Prince
Lichnowsky. The second violinist was now Karl Holz, and the 'cellist Joseph Linke.
CHAPTER XVII
THE STRING ENSEMBLE SINCE
BEETHOVEN
The general trend of development: Spohr, Cherubini, Schubert—
Mendelssohn, Schumann and Brahms, etc.—New developments:
César Franck, d’Indy, Chausson—The characteristics of the
Russian schools: Tschaikowsky, Borodine, Glazounoff and others
—Other national types: Grieg, Smetana, Dvořák—The three great
quartets since Schubert and what they represent; modern
quartets and the new quartet style: Debussy, Ravel, Schönberg—
Conclusion.
I
There is little history of the string quartet to record after the death of
Beethoven in 1827. It has undergone little or no change or
development in technique until nearly the present day. The last
quartets of Beethoven taxed the powers of the combined four
instruments to the uttermost. Such changes of form as are to be
noted in recent quartets are the adaptation of new ideas already and
first put to test in music for pianoforte, orchestra, or stage. The
growth of so-called modern systems of harmony affect the string
quartet, but did not originate in it. A tendency towards richer or fuller
scoring, towards continued use of pizzicato or other special effects,
and a few touches of new virtuosity here and there, reflect the
general interest of the century in the orchestra and its possibilities of
tone-coloring. But it is in the main true that after a study of the last
quartets of Beethoven few subsequent quartets present new
difficulties; and that, excepting only a few, the many with which we
shall have to do are the expressions of the genius of various
musicians, most of whom were more successful in other forms, or
whose qualities have been made elsewhere and otherwise more
familiar.
Less perhaps than any other form will the string quartet endure by
the sole virtue of being well written for the instruments. Take, for
example, the thirty-four quartets of Ludwig Spohr. Spohr was during
the first half of the nineteenth century the most respected musician in
Germany. He was renowned as a leader, and composer quite as
much as he was world-famous as a virtuoso. He was especially
skillful as a leader in quartet playing. He was among the first to bring
out the Beethoven quartets, opus 18, in Germany. He was under a
special engagement for three years to the rich amateur Tost in
Vienna to furnish chamber compositions. No composer ever
understood better the peculiar qualities of the string instruments;
none was ever more ambitious and at the same time more serious.
Yet excluding the violin concertos and an occasional performance of
his opera Jessonda, his music is already lost in the past. Together
with operas, masses, and symphonies, the quartets, quintets, and
quartet concertos, are rapidly being forgotten. The reason is that
Spohr was more conscientious than inspired. He stood in fear of the
commonplace. His melodies and harmonies are deliberately
chromatic, not spontaneous. Yet shy as he was of
commonplaceness in melody and harmony, he was insensitive to a
more serious commonplaceness.
But the point is that Spohr’s quartets have not lived. In neatness of
form and in treatment of the instruments they do not fall below the
greatest. They are in these respects superior to those of Schumann
for example. The weakness of them is the weakness of the man’s
whole gift for composition; and they represent no change in the art of
writing string quartets.
Ludwig Spohr.
Another man whose quartets are theoretically as good as any is
Cherubini. Of the six, that in E-flat major, written in 1814, is still
occasionally heard.
On the other hand, Schubert, a man with less skill than either Spohr
or Cherubini, has written quartets which seem likely to prove
immortal. Fifteen are published in the complete Breitkopf and Härtel
edition of Schubert’s works. Of these the first eleven may be
considered preparatory to the last four. They show, however, what is
frequently ignored in considering the life and art of Schubert—an
unremitting effort on the part of the young composer to master the
principles of musical form.
II
We may say that Schubert applied himself to the composition of
string quartets with a special devotion and ultimately with great
success; that certain qualities of his genius were suited to an
expression in this form. Mendelssohn applied himself to all branches
of music with equal facility and with evidently little preference. Most
of his chamber music for strings alone, however, belongs to the early
half of his successful career. This in the case of Mendelssohn does
not mean, as in the case of almost every other composer, that the
quartets may not be the expression of his fully-matured genius.
Mendelssohn never wrote anything better than the overture to
‘Midsummer Night’s Dream.’ This before he was twenty! But having
put his soul for once into a few quartets he passed on to other works.
There are six in all. The first, opus 12, is in E-flat major. The slow
introduction and the first allegro have all the well-known and now
often ridiculed marks of the ‘Songs Without Words’: short, regular
phrases; weak curves and feminine endings; commonplace
harmonies, monotonous repetitions, uninteresting accompaniment.
The second movement—a canzonetta—is interesting as
Mendelssohn could sometimes be in light pieces; but the andante
oozes honey again, and the final allegro is very long.
The first movement of the next quartet (in F major) likewise suggests
the quintet. The style is smoothly imitative and compact; and the
theme beginning in the fifty-seventh measure casts a shadow before.
The Andante quasi Variazioni is most carefully wrought, and is rich in
sentiment. The Scherzo which follows—in C minor—is syncopated
throughout. The final allegro suggests the last movement of the B-
flat major symphony, the joyous Spring symphony written not long
before.
The last quartet (in A) may rank with the finest of his compositions.
Whether or not in theory the style is pianistic, the effect is rich and
sonorous. The syncopations are sometimes baffling, especially in the
last movement; but on the whole this quartet presents the essence of
Schumann’s genius in most ingratiating and appealing form. The
structure is free, reminding one in some ways of the D minor
symphony. But there is no rambling. The whole work is intense.
There is an economy of mood and of thematic material. One phrase
dominates the first movement; the Assai agitato is a series of terse
variations. There is a sustained Adagio in D major; and then a
vigorous finale in free rondo form, the chief theme of which is
undoubtedly related to the chief theme of the first movement.
The first sextet, in B-flat major, has won more popular favor than
many other works by the same composer. The addition of two
instruments to the regular four brought with it the same sort of
problems which were mentioned in connection with Mozart’s
quintets: i.e., the avoidance of thickness in the scoring. The group of
six instruments is virtually a string orchestra; but the sextets of
Brahms are finely drawn, quite in the manner of a string quartet.
Especially in this first sextet have the various instruments a like
importance and independence.
The first theme of the first movement (cello) is wholly melodious. The
second theme, regularly brought forward in F major, is yet another
melody, and again is announced by the violoncello. A passage of
twenty-eight measures, over a pedal point on C, follows. This closes
the first section. The development is, as might be expected, full of
intricacies. The return of the first theme is brilliantly prepared,
beginning with announcing phrases in the low registers, swelling to a
powerful and complete statement in which the two violins join. The
second movement is a theme and variations in D minor. The theme
is shared alternately by first viola and first violin. The variations are
brilliant and daring, suggesting not a little the pianoforte variations on
a theme of Paganini’s. There is a Scherzo and Trio. The main motive
of the Scherzo serves as an accompaniment figure in the Trio; and
the Trio is noteworthy for being entirely fortissimo. The last
movement is a Rondo.
In these sextets and in the three quartets, written many years later,
we have the classical model faithfully reproduced. The separate
parts are handled with unfailing polyphonic skill; there is the special
refinement of expression which, hard to define, is unmistakable in a
work that is properly a string quartet.
Opus 51, No. 1, is in C minor. The first theme is given out at once by
the first violin; a theme characteristic of Brahms, of long phrases and
a certain swinging power. Within the broadly curving line there are
impatient breaks; and the effect of the whole is one of restlessness
and agitation. This is especially noticeable when, after a contrasting
section, the theme is repeated by viola and cello under an agitated
accompaniment, and leads to sharp accents. There is no little
resemblance between this theme and Brahms’ treatment of it, and
the theme of the first movement of the C minor symphony,
completed not long before. There is throughout this movement the
rhythm, like the sweep of angry waves, which tosses in the first
movement of the symphony; an agitation which the second theme
(B-flat major, first violin) cannot calm, which only momentarily—as
just after the second theme, here, and in the third section of the
movement—is subdued.