Patent Application Publication (10) Pub - No .: US 2022/0019856 A1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

US 20220019856A1

INI
( 19) United States
( 12 ) Lee
Patent
et al .
Application Publication ((4310)) Pub
Pub.. Date
No .: :US 2022/0019856 A1
Jan. 20 , 2022
( 54 ) PREDICTING NEURAL NETWORK (52) U.S. CI.
PERFORMANCE USING NEURAL CPC G06K 9/6262 ( 2013.01 ) ; G06N 3/08
NETWORK GAUSSIAN PROCESS (2013.01 ) ; G06K 97623 ( 2013.01 )
( 71 ) Applicant: Google LLC , Mountain View , CA (US )
( 57 ) ABSTRACT
( 72 ) Inventors : Jaehoon Lee , Palo Alto , CA (US );
Daiyi Peng , Cupertino, CA (US ); Yuan
Cao , Mountain View , CA (US ); Jascha A method for predicting performance of a neural network
Narain Sohl-Dickstein , San Francisco , (NN ) is described . The method includes receiving a training
CA (US ); Daniel Sung -Joon Park , data set having a set of training samples ; receiving a
Sunnyvale, CA (US ) validation data set having a set of validation pairs ; initial
izing ( i ) a validation - training kernel matrix representing
( 21 ) Appl. No .: 17 /377,142 similarities of the validation inputs in the validation data set
(22 ) Filed : Jul . 15 , 2021 and the training inputs in the training data set and (ii ) a
training -training kernel matrix representing similarities
Related U.S. Application Data across the training inputs within the training data set;
generating a final updated validation -training kernel matrix
( 60 ) Provisional application No. 63 /052,045 , filed on Jul . and a final updated training -training kernel matrix; perform
15 , 2020 . ing the following operations at least once : generating pre
Publication Classification dicted validation outputs for the validation inputs, and
updating an accuracy score of the NN based on the predicted
(51 ) Int. Ci. validation outputs and the validation outputs ; and outputting
G06K 9/62 ( 2006.01 ) the updated accuracy score as a final accuracy score repre
GO6N 3/08 ( 2006.01 ) senting performance of the NN .

Neural Network System 100


Predicted Validation Accuracy Final Accuracy
Outputs 126 Score 128 Score 130

Final Updated Validation Final Updated Training - training


training Kernel Matrix 122 Kernel Matrix 124

Updated Validation training Updated Training - training


Kernel Matrix 116 Kernel Matrix 118

Kernel Matrices
Generator 114

Training Feature Validation Feature


Vectors 110 Parameter Vectors 112
Values 108

Validation - training Neural Network 120 Training -training


Kernel Matrix 103 Kernel Matrix 105

Training Dataset 102 Validation Dataset 104.


Patent Application Publication Jan. 20 , 2022 Sheet 1 of 3 US 2022/0019856 A1

FAicnuralcyS130core 1
.
FIG

AcuracyS128core Tt-rain g KM1eart0ne5ilx

FUT-tpridanatielndgKM1eart2nei4lx UT-trpadintedgKM1eartne8ilx VFaelidturoen V1ecto2rs VD1alit0ds4eotn

VParleidcton 1Out2pu6ts KMaetrincels G1enrat4o Parmet 1Va0lu8es N1etu2wroa0lk

MVUFaplidntaieoldn 1Ktrae2irnexlg VUtarpldinteodgKM1eartnei6lx FTeraitnuregV1ecto0rs 1DTart0ins2etg


NS1eytus0wroaelkm Va-tlridatniogKM1eart0nei3lx
Patent Application Publication Jan. 20 , 2022 Sheet 2 of 3 US 2022/0019856 A1

202 204 206

arGcneogpaulsirtedznaoemgmkt-theatoreptilrnienxlgd umkthe-tarGeagsnutrlieanieztgxldrcuandthefte-ogpinladsrtznaielotdngkmeartneilx Gptvoruealgnsihdpracutneizogdsvuf,themtk-aprliedtnarieonlxdg mkt,andtheinoreautipnruietnlxgs dataset FIG


2
.

2003
Patent Application Publication Jan. 20 , 2022 Sheet 3 of 3 US 2022/0019856 A1

302
300
Receive a training data set having a set of training samples
% Receive a validation data set having a set of validation pairs
304

Initialize ( i ) a validationtraining
-trainingkernel
kernelmatrix
matrix and ()i ) a training- 5306
308
Generate a final updated validation training kernel matrix and a
final updated training- training kernel matrix

Sample a set of parameter values for the set of network 5310


parameters from an initialization distribution

312
Generate, from the training inputs, training feature vectors

314
Generate , from the validation inputs, validation feature vectors
316
Update the validation - training kernel matrix and the training
training kernel matrix
318
Generate predicted validation outputs for the validation inputs

Update an accuracy score of the neural network based on the


predicted validation outputs and the validation outputs in the IS 320
validation data set

322
Output the updated accuracy score as a final accuracy score that
represents performance of the neural network.

FIG . 3
US 2022/0019856 Al Jan. 20 , 2022
1

PREDICTING NEURAL NETWORK [ 0008 ] It is orders of magnitude cheaper in terms of


PERFORMANCE USING NEURAL computational resources ( e.g. , memory , processing
NETWORK GAUSSIAN PROCESS power, etc. ) and wall clock processing time than to
compute gradient -based performance predictions .
CROSS REFERENCE TO RELATED [ 0009 ] The overall ranking coming from NNGP valida
APPLICATIONS tion accuracy is comparable to gradient-based perfor
[ 0001 ] This application is a non -provisional of and claims mance predictions.
priority to U.S. Provisional Patent Application No. 63/052 , [ 0010 ] NNGP validation accuracy is better than very
045 , filed on Jul . 15 , 2020 , the entire contents of which are expensive gradient - based performance predictions at
hereby incorporated by reference . predicting whether a neural network has above median
-
performance.
BACKGROUND [ 0011 ] The NNGP validation accuracy can be used as a
signal complementary to gradient -based training, as
[ 0002 ] This specification relates to predicting performance NNGP performance is obtained without using any
of a neural network . gradient -based training, or past gradient-based training
[ 0003 ] Neural networks are machine learning models that data . That means , the NNGP validation accuracy can
employ one or more layers of nonlinear units to predict an augment gradient- based performance measures to
output for a received input. Some neural networks include improve their predictive quality .
one or more hidden layers in addition to an output layer. The [ 0012 ] Further, by reducing computational costs of evalu
output of each hidden layer is used as input to the next layer ating neural network performance and neural architecture
in the network , i.e. , the next hidden layer or the output layer. search, the described techniques would lead to a reduction of
Each layer of the network generates an output from a the footprint, i.e. , in terms of processing power and energy
received input in accordance with current values of a respec consumption, of deep learning research and applications.
tive set of parameters. [ 0013 ] The details of one or more embodiments of the
SUMMARY subject matter of this specification are set forth in the
accompanying drawings and the description below . Other
[ 0004 ] This specification describes a system implemented features, aspects , and advantages of the subject matter will
as computer programs on one or more computers in one or become apparent from the description, the drawings, and the
more locations that predicts an accuracy score that repre claims .
sents performance of a neural network without training the BRIEF DESCRIPTION OF THE DRAWINGS
neural network . The neural network is one of multiple
candidate neural networks for performing a neural network [ 0014 ] FIG . 1 shows an example neural network system .
task that requires generating network outputs for network [ 0015 ] FIG . 2 is a flow diagram of an example process for
inputs. The system can use the accuracy score of the neural generating predicted validation outputs.
network to select a final neural network from the multiple [ 0016 ] FIG . 3 is aa flow diagram of an example process for
candidate neural networks for performing the neural net predicting performance of aa neural network .
work task .
[ 0005 ] The subject matter described in this specification [ 0017] Like reference numbers and designations in the
can be implemented in particular embodiments so as to various drawings indicate like elements .
realize one or more of the following advantages . DETAILED DESCRIPTION
[ 0006 ] Accurately predicting the performance of a neural
network on a task is extremely computationally expensive [ 0018 ] This specification describes a system implemented
and the most reliable way of doing so is training the neural as computer programs on one or more computers in one or
network to completion. In conventional neural architecture more locations that predicts an accuracy score that repre
search methods, the performance of a large number of neural sents the performance of aa neural network without training
networks in the search space must be evaluated . The the neural network . The neural network is one of multiple
described techniques provide an inexpensive measure candidate neural networks for performing a neural network
indicative of the final performance of a neural network task that requires generating network outputs for network
without training the neural network (e.g. , without using inputs. The system can use the accuracy score of the neural
gradient-based learning ), therefore reducing the computa network to select a final neural network from the multiple
tional cost of neural architecture search . candidate neural networks for performing the neural net
[ 0007] In particular, the described techniques compute work task .
“ Neural Network Gaussian Process (NNGP) validation [ 0019 ] The neural network can be configured to perform
accuracy ” ( or an “ accuracy score ” ) and use it as a compu any kind of neural network tasks , i.e. , can be configured to
tationally inexpensive quantity to predict the neural net receive any kind of digital data input and to generate any
work's actual performance on a neural network task without kind of score , classification , or regression output based on
training the neural network . This NNGP validation accuracy the input.
is obtained by computing a NNGP kernel of the neural [ 0020 ] In some cases , the neural network is configured to
network ( by repeatedly reinitializing the parameters of the perform an image processing task, i.e. , receive an input
network based on an initialization distribution of the param image and to process the input image to generate a network
eters ) and performing Gaussian inference on a validation set output for the input image . For example , the task may be
using the constructed kernel. Compared to gradient -based image classification and the output generated by the neural
measures , the NNGP validation accuracy has the following network for a given image may be scores for each of a set
technical advantages: of object categories , with each score representing an esti
US 2022/0019856 A1 Jan. 20 , 2022
2

mated likelihood that the image contains an image of an features of text in aa natural language and the network output
object belonging to the category. As another example, the is a spectrogram or other data defining audio of the text
task can be image embedding generation and the output being spoken in the natural language.
generated by the neural network can be aa numeric embed [ 0027] As another example, the task can be a health
ding of the input image . As yet another example, the task can prediction task , where the input is electronic health record
be object detection and the output generated by the neural data for a patient and the output is a prediction that is
network can identify locations in the input image at which relevant to the future health of the patient, e.g. , a predicted
particular types of objects are depicted. As yet another treatment that should be prescribed to the patient, the
example, the task can be image segmentation and the output likelihood that an adverse health event will occur to the
generated by the neural network can assign each pixel of the patient, or a predicted diagnosis for the patient.
input image to a category from a set of categories. [ 0028 ] As another example, the task can be an agent
[ 0021 ] As another example, if the inputs to the neural control task , where the input is an observation characterizing
network are Internet resources (e.g. , web pages ) , documents, the state of an environment and the output defines an action
or portions of documents or features extracted from Internet to be performed by the agent in response to the observation .
resources , documents, or portions of documents, the task can The agent can be , e.g. , a real- world or simulated robot, a
be to classify the resource or document, i.e. , the output control system for an industrial facility, or a control system
generated by the neural network for a given Internet that controls a different kind of agent.
resource , document, or portion of a document may be a score [ 0029 ] FIG . 1 shows an example neural network system .
for each of a set of topics , with each score representing an The system 100 is an example of a system implemented as
estimated likelihood that the Internet resource, document, or computer programs on one or more computers in one or
document portion is about the topic . more locations , in which the systems, components, and
[ 0022 ] As another example , if the inputs to the neural techniques described below can be implemented.
network are features of an impression context for a particu [ 0030 ] The system 100 is configured to predict perfor
lar advertisement, the output generated by the neural net mance of a neural network 120. In particular, the system 100
work may be a score that represents an estimated likelihood computes a " Neural Network Gaussian Process (NNGP)
that the particular advertisement will be clicked on . validation accuracy ” (or an “ accuracy score ” ), denoted as
[ 0023 ] As another example, if the inputs to the neural Avat, and use it as a computationally inexpensive quantity to
network are features of a personalized recommendation for predict the neural network 120's actual performance on a
a user, e.g. , features characterizing the context for the neural network task without training the neural network 120 .
recommendation , e.g. , features characterizing previous [ 0031 ] To predict performance of the neural network 120
actions taken by the user , the output generated by the neural without training it , the system 100 receives a training data
network may be a score for each of a set of content items, set 102 having a set of training samples. Each of the training
with each score representing an estimated likelihood that the samples has a training input x ; and a corresponding training
user will respond favorably to being recommended the output y’ . The system 100 receives a validation data set 104
content item . having a set of validation pairs . Each of the validation pairs
[ 0024 ] As another example, if the input to the neural has a validation input denoted as xa and a corresponding
network is a sequence of text in one language, the output validation output denoted as Ia .
generated by the neural network may be a score for each of [ 0032] The neural network 120 can have any appropriate
a set of pieces of text in another language, with each score architecture that allows the network 120 to map an input x
representing an estimated likelihood that the piece of text in into a feature vector z , followed by one or more output
the other language is a proper translation of the input text layers, e.g. , a linear readout layer, to produce a predicted
into the other language. As another example, the task may be output for the input x .
an audio processing task . For example , if the input to the [ 0033 ] For example , the neural network 120 can be
neural network is a sequence representing a spoken utter denoted as f ( ) and can have a set ofnetwork parameters 0 .
ance, the output generated by the neural network may be a The neural network 120 can be configured to map an input
score for each of a set of pieces of text, each score repre x into a feature vector ž = f ( x, 0 ) of dimension d, followed by
senting an estimated likelihood that the piece of text is the a linear readout layer with weight variance o producing2
W

correct transcript for the utterance . As another example, if predicted outputs ? for the input x . Consider n inputs x ' , x > ,
the input to the neural network is a sequence representing a x" . In the NNGP approximation , the distribution over
spoken utterance , the output generated by the neural net possible outputs ? , ... , ? ” for any label I at initialization
work can indicate whether a particular word or phrase is jointly Gaussian,
( “ hotword ”) was spoken in the utterance. As another
example , if the input to the neural network is a sequence
representing a spoken utterance , the output generated by the (1)
neural network can identify the natural language in which
the utterance was spoken .
[ 0025 ] As another example, the task can be a natural
=

t[! ::
( ? ) , ... , 1" ) ~ N (0,07K ), Kij = K (x' , x ) = E,

language processing or understanding task , e.g. , an entail where I denotes an index of a classification label . For
ment task , a paraphrase task, a textual similarity task , a example , for a digit classification task , the index I goes from
sentiment task , a sentence completion task , a grammaticality 0 to 9. Thus, for a given input x , the neural network 120
task, and so on, that operates on a sequence of text in some generates ten output vectors ?o , ?? , ... , ?g .
natural language. [ 0034 ] In Eq . ( 1 ) , K (x+ , x) is a sample - sample second
[ 0026 ] As another example, the task can be a text to moment of ž averaged over random network initializations .
speech task , where the input is text in ?a natural language or More specifically, a sample - sample second moment means
US 2022/0019856 A1 Jan. 20 , 2022
3

that the indices i , j are sample indices , and that the moment feature vector z = f ( x ' ; 0 ) by processing the training input
is second -order, i.e. , quadratic in the values of ž . Eq . ( 1 ) is using the neural network with the neural network having the
used to compute an average over the variable and the k sampled parameter values .
variable . As mentioned above , ž is a d -dimensional vector [ 0041 ] The system 100 generates, from the validation
( i.e. , k= 1 , 2 , ... , d) , which is aa function of the input x and inputs, validation feature vectors 112 using the neural net
the network parameters 0. First , a sum over the k index and work 120 in accordance with the sampled parameter values
divide by dimension d is computed, i.e. , the sum is 108 of the network parameters . In particular, for each
validation input a E [ 0 , Nyal] , the system 100 generates a
respective validation feature vector Za= f ( x ^; 0 ) by process
ing the validation input using the neural network with the
Z
zizuli neural network having the sampled parameter values .
[ 0042 ] The system 100 uses a kernel matrices generator
114 to update the validation -training kernel matrix 103 and
The obtained result is then averaged over the initialization the training- training kernel matrix 105 using the validation
distribution of the network parameters 0. That means ( and as feature vectors 110 and the training feature vectors 112 .
described in more detail below ) the network parameters 0 is [ 0043 ] In particular, the kernel matrices generator 114
re - initialized nensemble times ( i.e. , by sampling a set of updates the validation - training kernel matrix 103 using the
parameter values for the set of network parameters from validation feature vectors 112 and the training feature vec
an initialization distribution p ( 0 ) each time ) and the obtained tors 110. The kernel matrices generator 114 generates an
result is averaged over these o values . update for each of a plurality of elements of the validation
[ 0035 ] The system 100 initializes a validation - training training kernel matrix 103 based on a dot product of a
kernel matrix 103 that represents similarities of the valida respective validation feature vector and a respective training
tion inputs in the validation data set 104 and the training feature vector. For example, for each element (a , i ) E [ 0 ,
2

inputs in the training data set 102. The validation - training Nyal ) x [ 0 , Ntrain ), the generator 114 generates an update by
kernel matrix 103 , denoted as Ku , is a two -dimensional adding
matrix having a number of rows equal to the number of
validation pairs in the validation data set and a number of
columns equal to the number of training samples in the 1
training data set. In some implementations, the system 100 Nensembled Zazi
may initialize the validation -training kernel matrix 103 with
a first zero matrix , i.e. , K " =0NyaXNpain? where Nyai is the to Kaiut. The generator 114 combines the updates for all
number of validation pairs in the validation data set 104 , and elements of the matrix 103 to generate an updated valida
Ntrain is the number of training samples in the training data tion -training kernel matrix 116 for each of the nensemble steps .
set 102 .
[ 0036 ] The system 100 initializes a training -training ker [ 0044 ] The system 100 uses the kernel matrices generator
nel matrix 105 that represents similarities across the training 114 to update the training - training kernel matrix 105 based
inputs within the training data set 102. The training - training on the training feature vectors 110. The kernel matrices
kernel matrix 105 , denoted as K", is a two -dimensional generator 114 generates an update for each of a plurality of
>

square matrix in which the number of rows / columns equals elements of the training -training kernel matrix 105 based on
to the number of training samples in the training data set a dot product of a respective first training feature vector and
102. The system 100 may initialize the training - training a respective second training feature vector.
kernel matrix 105 with a second zero matrix . For example, [ 0045 ] In particular, the generator 114 updates the train
the matrix 105 can be initialized as follows: K = ON.vaixN train ing - training kernel matrix 105 by, for each element (i , j ) E
[ 0037] The system 100 initializes an accuracy score 128 of [ 0 , Nirain ) , adding
the neural network 120. For example, the system 100 may
initialize the accuracy score 128 as zero , i.e. , Aval =0 . As 1
another example , the system 100 may initialize the accuracy NensembledZiZj
score 128 using a predetermined value .
[ 0038 ] The system 100 generates a final updated valida
tion - training kernel matrix and a final dated training to K,. The updates for all elements of the matrix 105 result
in an updated training -training kernel matrix 118 for each of
training kernel matrix by performing nensemble steps as fol
lows . the nensemble steps.
[ 0039 ] At each of nensemble steps, the system 100 samples [ 0046 ] After performing nensemble steps above, the system
a set of parameter values 108 for the set of network 100 obtains a final updated validation - training kernel matrix
parameters from an initialization distribution p (0 ). The 122 and aa final updated training - training kernel matrix 124 .
system 100 can receive the initialization distribution p as [ 0047 ] The system 100 generates predicted validation out
part of the input to the system 100. The initialization puts 126 , denoted as Yal, for the validation inputs using the
distribution p can be , for example, a uniform distribution or final updated training - training kernel matrix 122 , the final
a Gaussian distribution . updated validation - training kernel matrix 124 , and the train
[ 0040 ] The system 100 then generates, from the training ing outputs in the training data set 102. The process for
inputs, training feature vectors 110 using the neural network generating the predicted validation outputs 126 is described
120 in accordance with the sampled parameter values of the in more detail below with reference to FIG . 2 .
network parameters. In particular, for each training input i e [ 0048 ] The system 100 updates the accuracy score 128 of
[ 0 , Ntrain ), the system 100 generates a respective training the neural network 120 based on the predicted validation
US 2022/0019856 A1 Jan. 20 , 2022
4

outputs 126 and the validation outputs in the validation data [ 0060 ] where à is determined based on the number of
set 104. As mentioned above , the accuracy score 128 could training samples in the training dataset 102 and on the
be initialized , for example, as zero . training - training kernel matrix 105. For example, à can be
[ 0049 ] In particular, to update the accuracy score 128 , the computed by
system 100 generates a candidate updated accuracy score
based on a number of correctly predicted validation outputs . 1
For example, the candidate updated accuracy score can be A = Ntrain- Tr ( K" ) ,
generated as follows:
Cvar2ala == arg max ; Yal)
[ 0050 ] The system 100 then generates the updated accu where Tr (K " ) is the trace of the training -training kernel
matrix 105 .
racy score based on the current accuracy score and the [ 0061 ] The system generates the predicted validation out
candidate updated accuracy score , for example, by selecting puts using the regularized training -training kernel matrix ,
the maximum of ( i ) the current accuracy score and ( ii ) the the final updated validation -training kernel matrix , and the
candidate updated accuracy score divided by the number of training outputs in the training data set ( step 206 ) . For
validation pairs in the validation data set 104 as follows: example, the predicted validation outputs can be denoted as
Aval-max (Avad, Cval/Nval) Ya and can be computed as follows:
[ 0051 ] The system 100 can repeatedly generate new pre YfE; K .. " ( K * + ?A1Nrain) i ' y/
dicted validation outputs 126 and update the accuracy score [ 0062 ] FIG . 3 is aa flow diagram of an example process for
128 for each regularization constant € E [€ , ... , 6 ].
1 predicting performance of a neural network . For conve
[ 0052 ] The system 100 outputs the updated accuracy score nience, the process 300 will be described as being performed
obtained after these iterations as a final accuracy score 130 by a system of one or more computers located in one or more
that represents performance of the neural network 120. locations. For example, a neural network system , e.g. , the
[ 0053 ] In some implementations, the neural network 120 neural network system 100 of FIG . 1 , appropriately pro
is one of multiple candidate neural networks generated grammed in accordance with this specification, can perform
during performance of a neural architecture search tech the process 300 .
nique, which aims to find a neural network for performing [ 0063 ] The neural network is configured to receive aa given
a neural network task . input and to process the given input to generate a corre
[ 0054 ] The system can use the final accuracy score of the sponding feature vector having a dimension d for the given
neural network 120 to select a final neural network from the input. The neural network includes an output layer config
multiple candidate neural networks for performing the neu ured to process the corresponding feature vector to generate
ral network task . a corresponding output for the given input. The neural
network is configured to generate the corresponding output
[ 0055 ] The final accuracy score , which is referred to as a based on a probability distribution of possible outputs ( e.g. ,
NNGP validation accuracy , could be used in combination the Gaussian distribution defined by Eq . ( 1 ) ) by selecting a
with other network performance evaluation techniques. For possible output having the largest probability as the corre
example , the NNGP validation accuracy can be used as a sponding output .
signal complementary to gradient -based training, as the [ 0064 ] The system receives a training data set having a set
NNGP performance is obtained without using any gradient of training samples, each of the training samples having a
based training, or past gradient-based training data. That training input and a corresponding training output ( step
means, the NNGP validation accuracy can augment gradi 302 ) .
ent - based performance measures to improve their predictive [ 0065 ] The system receives aa validation data set having a
quality. set of validation pairs , each of the validation pairs having a
[ 0056 ] FIG . 2 is aa flow diagram of an example process 200 validation input and a corresponding validation output ( step
for generating predicted validation outputs. For conve 304 ) . The system initializes (i ) a validation - training kernel
nience , the process 200 will be described as being performed matrix representing similarities of the validation inputs in
by a system of one or more computers located in one or more the validation data set and the training inputs in the training
locations. For example, a neural network system , e.g. , the data set and ( ii ) a training -training kernel matrix represent
neural network system 100 of FIG . 1 , appropriately pro ing similarities across the training inputs within the training
grammed in accordance with this specification, can perform data set (step 306 ) . The validation - training kernel matrix is
the process 200 . a two - dimensional matrix having a number of rows equal to
[ 0057] The system generates a regularization constant the number of validation pairs in the validation data set and
representing random noise applied to the training - training a number of columns equal to the number of training
kernel matrix ( step 202 ) . The regularization constant, samples in the training data set . Initializing the validation
denoted as ? , can be selected from a set of r constants [ € 1 , training kernel matrix may include initializing the valida
.. , , ] tion - training kernel matrix with a first zero matrix . The
[ 0058 ] The system generates a regularized training - train training - training kernel matrix is a two - dimensional square
ing kernel matrix using the regularization constant and the matrix in which the number of rows equals to the number of
final updated training -training kernel matrix ( step 204 ) . training samples in the training data set. Initializing the
training -training kernel matrix comprises initializing the
[ 0059 ] For example , the regularized training - training ker training- training kernel matrix with aa second zero matrix.
nel matrix can be computed as follows: [ 0066 ] The system generates a final updated validation
K * + € 1NoNtrain training kernel matrix and a final updated training - training
US 2022/0019856 A1 Jan. 20 , 2022
5

kernel matrix ( 308 ) . The generating includes, at each of N [ 0077 ] This specification uses the term " configured ” in
steps , performing steps 310-316 as follows. connection with systems and computer program compo
[ 0067] The system samples a set of parameter values for nents . For a system of one or more computers to be
the set of network parameters from an initialization distri configured to perform particular operations or actions means
bution ( step 310) . that the system has installed on it software, firmware ,
[ 0068 ] The system generates, from the training inputs, hardware , or a combination of them that in operation cause
training feature vectors using the feature extraction neural the system to perform the operations or actions . For one or
network in accordance with the sampled parameter values of more computer programs to be configured to perform par
the network parameters ( step 312 ) . ticular operations or actions means that the one or more
[ 0069 ] The system generates, from the validation inputs, programs include instructions that, when executed by data
validation feature vectors using the feature extraction neural processing apparatus, cause the apparatus to perform the
network in accordance with the sampled parameter values of operations or actions .
the network parameters ( step 314 ) .
[ 0070 ] The system updates the validation - training kernel [ 0078 ] Embodiments of the subject matter and the func
matrix and the training - training kernel matrix using the tional operations described in this specification can be
validation feature vectors and the training feature vectors implemented in digital electronic circuitry, in tangibly
( step 316 ) . In particular, the system updates the validation embodied computer software or firmware, in computer hard
training kernel matrix using the validation feature vectors ware , including the structures disclosed in this specification
and the training feature vectors , and updates the training and their structural equivalents, or in combinations of one or
training kernel matrix using the training feature vectors . more of them . Embodiments of the subject matter described
[ 0071 ] To update the validation - training kernel matrix , the in this specification can be implemented as one or more
system generates an update for each of a plurality of computer programs, i.e. , one or more modules of computer
elements of the validation -training kernel matrix based on a program instructions encoded on a tangible non transitory
dot product of a respective validation feature vector and a storage medium for execution by, or to control the operation
respective training feature vector. of, data processing apparatus. The computer storage medium
[ 0072 ] To update the training -training kernel matrix , the can be aa machine- readable storage device , a machine-read
system generates an update for each of a plurality of able storage substrate , a random or serial access memory
elements of the training -training kernel matrix based on a device, or a combination of one or more of them . Alterna
dot product of a respective first training feature vector and tively or in addition , the program instructions can be
a respective second training feature vector. The system encoded on an artificially generated propagated signal, e.g. ,
performs steps 318 and 320 below at least once . a machine - generated electrical, optical , or electromagnetic
[ 0073 ] The system generates predicted validation outputs signal, that is generated to encode information for transmis
for the validation inputs using the final updated training sion to suitable receiver apparatus for execution by a data
training kernel matrix , the final updated validation -training processing apparatus.
kernel matrix , and the training outputs in the training data set [ 0079 ] The term “ data processing apparatus ” refers to data
( step 318 ) . In particular, the system generates a regulariza processing hardware and encompasses all kinds of appara
tion constant representing random noise applied to the tus, devices , and machines for processing data, including by
training - training kernel matrix . The system generates a way of example a programmable processor, a computer, or
regularized training -training kernel matrix using the regu multiple processors or computers . The apparatus can also be ,
larization constant and the final updated training -training or further include , special purpose logic circuitry , e.g. , an
kernel matrix . The system generates the predicted validation FPGA ( field programmable gate array ) or an ASIC (appli
outputs using the regularized training - training kernel matrix , cation specific integrated circuit ). The apparatus can option
the final updated validation - training kernel matrix , and the ally include, in addition to hardware, code that creates an
training outputs in the training data set. execution environment for computer programs, e.g. , code
[ 0074 ] The system updates an accuracy score of the neural that constitutes processor firmware, a protocol stack , a
network based on the predicted validation outputs and the database management system , an operating system , or a
validation outputs in the validation data set ( step 320 ) . In combination of one or more of them .
particular, the system generates a candidate updated accu [ 0080 ] A computer program , which may also be referred
racy score based on a number of correctly predicted vali to or described as a program , software, a software applica
dation outputs, and generates the updated accuracy score by tion , an app , a module, a software module , a script , or code ,
selecting the maximum of (i ) the current accuracy score and can be written in any form of programming language ,
( ii ) the candidate updated accuracy score divided by the including compiled or interpreted languages, or declarative
number of validation pairs in the validation data set . or procedural languages ; and it can be deployed in any form ,
[ 0075 ] The system outputs the updated accuracy score as including as a stand - alone program or as a module , compo
a final accuracy score that represents performance of the nent, subroutine , or other unit suitable for use in a computing
neural network ( step 322 ) . environment. A program may, but need not, correspond to a
[ 0076 ] In some implementations, the neural network is file in a file system . A program can be stored in a portion of
one of a plurality of candidate neural networks for perform a file that holds other programs or data , e.g. , one or more
ing a neural network task that requires generating network scripts stored in a markup language document, in a single
outputs for network inputs. In these implementations, the file dedicated to the program in question , or in multiple
system can use the final accuracy score of the neural network coordinated files, e.g. , files that store one or more modules ,
to select a final neural network from the plurality of candi sub programs, or portions of code. A computer program can
date neural networks for performing the neural network task . be deployed to be executed on one computer or on multiple
US 2022/0019856 A1 Jan. 20 , 2022
6

computers that are located at one site or distributed across provided to the user can be any form of sensory feedback ,
multiple sites and interconnected by a data communication e.g. , visual feedback , auditory feedback , or tactile feedback ;
network . and input from the user can be received in any form ,
[ 0081 ] In this specification , the term “ database ” is used including acoustic , speech, or tactile input. In addition, a
broadly to refer to any collection of data : the data does not computer can interact with a user by sending documents to
need to be structured in any particular way, or structured at and receiving documents from a device that is used by the
all , and it can be stored on storage devices in one or more user ; for example, by sending web pages to a web browser
locations. Thus, for example , the index database can include on a user's device in response to requests received from the
multiple collections of data , each of which may be organized web browser. Also , a computer can interact with a user by
and accessed differently. sending text messages or other forms of message to a
[ 0082 ] Similarly , in this specification the term " engine" is personal device , e.g. , a smartphone that is running a mes
used broadly to refer to a software -based system , subsystem , saging application, and receiving responsive messages from
or process that is programmed to perform one or more the user in return .
specific functions. Generally, an engine will be implemented [ 0087] Data processing apparatus for implementing
as one or more software modules or components, installed machine learning models can also include , for example,
on one or more computers in one or more locations. In some special-purpose hardware accelerator units for processing
cases , one or more computers will be dedicated to a par common and compute - intensive parts of machine learning
ticular engine; in other cases , multiple engines can be training or production , i.e. , inference, workloads.
installed and running on the same computer or computers. [ 0088 ] Machine learning models can be implemented and
[ 0083 ] The processes and logic flows described in this deployed using a machine learning framework, e.g. , a Ten
specification can be performed by one or more program sorFlow framework , a Microsoft Cognitive Toolkit frame
mable computers executing one or more computer programs work , an Apache Singa framework , or an Apache MXNet
to perform functions by operating on input data and gener framework .
ating output . The processes and logic flows can also be [ 0089 ] Embodiments of the subject matter described in
performed by special purpose logic circuitry, e.g. , an FPGA this specification can be implemented in a computing system
or an ASIC , or by a combination of special purpose logic that includes aa back end component, e.g. , as a data server, or
circuitry and one or more programmed computers . that includes aa middleware component, e.g. , an application
[ 0084 ) Computers suitable for the execution of a computer server , or that includes a front end component, e.g. , a client
program can be based on general or special purpose micro computer having a graphical user interface , a web browser,
processors or both , or any other kind of central processing or an app through which a user can interact with an imple
unit . Generally, a central processing unit will receive mentation of the subject matter described in this specifica
instructions and data from aa read only memory or a random tion , or any combination of one or more such back end,
access memory or both . The essential elements of a com middleware, or front end components . The components of
puter are a central processing unit for performing or execut the system can be interconnected by any form or medium of
ing instructions and one or more memory devices for storing digital data communication, e.g. , a communication network .
instructions and data. The central processing unit and the Examples of communication networks include aa local area
memory can be supplemented by, or incorporated in, special network (LAN ) and a wide area network ( WAN ), e.g. , the
purpose logic circuitry. Generally, a computer will also Internet.
include , or be operatively coupled to receive data from or [ 0090 ] The computing system can include clients and
transfer data to , or both, one or more mass sto devices ser rs . A client and server are generally remote from each
for storing data , e.g. , magnetic, magneto optical disks , or other and typically interact through a communication net
optical disks. However, a computer need not have such work . The relationship of client and server arises by virtue
devices. Moreover, a computer can be embedded in another of computer programs running on the respective computers
device , e.g. , a mobile telephone, a personal digital assistant and having a client - server relationship to each other. In some
( PDA) , a mobile audio or video player, a game console , a embodiments, a server transmits data , e.g. , an HTML page ,
Global Positioning System (GPS ) receiver, or a portable to a user device, e.g. , for purposes of displaying data to and
storage device, e.g. , a universal serial bus (USB ) flash drive , receiving user input from a user interacting with the device,
to name just a few . which acts as a client. Data generated at the user device , e.g. ,
[ 0085 ) Computer readable media suitable for storing com a result of the user interaction , can be received at the server
puter program instructions and data include all forms of non from the device .
volatile memory , media and memory devices, including by [ 0091 ] While this specification contains many specific
way of example semiconductor memory devices, e.g. , implementation details, these should not be construed as
EPROM , EEPROM , and flash memory devices ; magnetic limitations on the scope of any invention or on the scope of
disks , e.g. , internal hard disks or removable disks ; magneto what may be claimed , but rather as descriptions of features
optical disks ; and CD ROM and DVD - ROM disks . that may be specific to particular embodiments of particular
[ 0086 ] To provide for interaction with a user, embodi inventions. Certain features that are described in this speci
ments of the subject matter described in this specification
can be implemented on a computer having a display device ,
fication in the context of separate embodiments can also be
implemented in combination in a single embodiment. Con
e.g. , a CRT ( cathode ray tube) or LCD ( liquid crystal versely , various features that are described in the context of
display ) monitor, for displaying information to the user and a single embodiment can also be implemented in multiple
a keyboard and a pointing device, e.g. , a mouse or a embodiments separately or in any suitable subcombination .
trackball, by which the user can provide input to the com Moreover, although features may be described above as
puter. Other kinds of devices can be used to provide for acting in certain combinations and even initially be claimed
interaction with a user as well ; for example, feedback as such, one or more features from a claimed combination
US 2022/0019856 A1 Jan. 20 , 2022
7

can in some cases be excised from the combination , and the kernel matrix , the final updated validation - training
claimed combination may be directed to a subcombination kernel matrix , and the training outputs in the training
or variation of a subcombination . data set, and
[ 0092 ] Similarly, while operations are depicted in the updating an accuracy score of the neural network based
drawings and recited in the claims in a particular order, this on the predicted validation outputs and the validation
should not be understood as requiring that such operations outputs in the validation data set ; and
be performed in the particular order shown or in sequential outputting the updated accuracy score as a final accuracy
order, or that all illustrated operations be performed, to score that represents performance of the neural net
achieve desirable results . In certain circumstances, multi work .
tasking and parallel processing may be advantageous. More 2. The method of claim 1 , wherein the neural network is
over, the separation of various system modules and compo configured to receive a given input and to process the given
nents in the embodiments described above should not be input to generate a corresponding feature vector having a
understood as requiring such separation in all embodiments, dimension d for the given input, and wherein the neural
and it should be understood that the described program network comprises an output layer configured to process the
components and systems can generally be integrated corresponding feature vector to generate a corresponding
together in a single software product or packaged into output for the given input.
multiple software products. 3. The method of claim 2 , wherein the neural network is
[ 0093 ] Particular embodiments of the subject matter have configured to generate the corresponding output based on a
been described . Other embodiments are within the scope of probability distribution of possible outputs by selecting a
the following claims . For example, the actions recited in the possible output having the largest probability as the corre
claims can be performed in a different order and still achieve sponding output
desirable results . As one example, the processes depicted in 4. The method of claim 1 , wherein the validation -training
the accompanying figures do not necessarily require the kernel matrix is aa two - dimensional matrix having a number
particular order shown , or sequential order, to achieve of rows equals to the number of validation pairs in the
desirable results . In some cases , multitasking and parallel validation data set and a number of columns equals to the
processing may be advantageous. number of training samples in the training data set .
What is claimed is : 5. The method of claim 1 , wherein initializing the vali
1. A method for predicting performance of a neural dation - training kernel matrix comprises initializing the vali
network , wherein the neural network has a set of network dation - training kernel matrix with a first zero matrix .
parameters , the method comprising: 6. The method of claim 1 , wherein the training - training
receiving a training data set having a set of training kernel matrix is a two - dimensional square matrix in which
samples , each of the training samples having a training the number of rows equals to the number of training samples
input and a corresponding training output; in the training data set.
receiving a validation data set having a set of validation 7. The method of claim 1 , wherein initializing the train
pairs , each of the validation pairs having a validation ing - training kernel matrix comprises initializing the train
input and a corresponding validation output; ing -training kernel matrix with a second zero matrix .
initializing ( i ) a validation - training kernel matrix repre 8. The method of claim 1 , wherein generating, from the
senting similarities of the validation inputs in the training inputs and the validation inputs, the training feature
validation data set and the training inputs in the training vectors and the validation feature vectors comprises:
data set and (ii ) a training - training kernel matrix rep generating, for each training input, a respective training
resenting similarities across the training inputs within feature vector using the neural network in accordance
the training data set ; with the sampled values of the network parameters , and
generating a final updated validation -training kernel generating, for each validation input, a respective valida
matrix and a final updated training -training kernel tion feature vector using the neural network in accor
matrix , the generating comprising, at each of N steps : dance with the sampled values of the network param
sampling a set of parameter values for the set of eters.
network parameters from an initialization distribu 9. The method of claim 1 , wherein updating the valida
tion , tion - training kernel matrix and the training -training kernel
generating, from the training inputs, training feature matrix using the validation feature vectors and the training
vectors using the neural network in accordance with feature vectors comprises:
the sampled parameter values of the network param updating the validation -training kernel matrix using the
eters , validation feature vectors and the training feature vec
generating, from the validation inputs, validation fea tors, and
ture vectors using the neural network in accordance updating the training -training kernel matrix using the
with the sampled parameter values of the network training feature vectors .
parameters, and 10. The method of claim 9 , wherein updating the valida
updating the validation training kernel matrix and the tion - training kernel matrix using the validation feature vec
training- training kernel matrix using the validation tors and the training feature vectors comprises :
feature vectors and the training feature vectors ; generating an update for each of a plurality of elements of
performing the following operations at least once : the validation - training kernel matrix based on a dot
generating predicted validation outputs for the valida product of a respective validation feature vector and a
tion inputs using the final updated training- training respective training feature vector.
US 2022/0019856 A1 Jan. 20 , 2022
8

11. The method of claim 9 , wherein updating the training generating, from the training inputs, training feature
training kernel matrix using the training feature vectors vectors using the neural network in accordance with
comprises: the sampled parameter values of the network param
generating an update for each of a plurality of elements of eters , generating, from the validation inputs, valida
the training -training kernel matrix based on a dot tion feature vectors using the neural network in
product of a respective first training feature vector and accordance with the sampled parameter values of the
a respective second training feature vector. network parameters , and
12. The method of claim 1 , wherein generating the updating the validation - training kernel matrix and the
predicted validation outputs using the final updated training training -training kernel matrix using the validation
training kernel matrix, the final updated validation -training feature vectors and the training feature vectors ;
kernel matrix , and the training outputs in the training data set performing the following operations at least once :
comprises: generating predicted validation outputs for the valida
generating a regularization constant representing random tion inputs using the final updated training - training
noise applied to the training - training kernel matrix , kernel matrix , the final updated validation - training
generating a regularized training - training kernel matrix kernel matrix , and the training outputs in the training
using the regularization constant and the final updated data set, and
training - training kernel matrix, and updating an accuracy score of the neural network based
generating the predicted validation outputs using the on the predicted validation outputs and the validation
regularized training -training kernel matrix , the final outputs in the validation data set ; and
updated validation - training kernel matrix , and the train outputting the updated accuracy score as a final accuracy
ing outputs in the training data set. score that represents performance of the neural net
13. The method of claim 1 , wherein updating the accuracy work .
score of the neural network based on the predicted validation 16. A system comprising one or more computers and one
outputs and the validation outputs in the validation training or more non -transitory computer storage media encoded
comprises:
data set ce with instructions that, when executed by the one or more
generating a candidate updated accuracy score based on a computers, cause the one or more computers to perform
number of correctly predicted validation outputs, and operations for predicting performance of a neural network ,
generating the updated accuracy score by selecting the wherein the neural network has a set of network parameters ,
maximum of the current accuracy score and the can the operation comprising:
didate updated accuracy score divided by the number of receiving a training data set having a set of training
validation pairs in the validation dataset. samples, each of the training samples having a training
14. The method of claim 1 , wherein the neural network is input and a corresponding training output;
one of a plurality of candidate neural networks for perform receiving a validation data set having a set of validation
ing a neural network task that requires generating network pairs , each of the validation pairs having a validation
outputs for network inputs , and wherein the method further input and a corresponding validation output;
comprises:
using the final accuracy score of the neural network to initializing (i) a validation - training kernel matrix represent
select a final neural network from the plurality of ing similarities of the validation inputs in the validation data
candidate neural networks for performing the neural set and the training inputs in the training data set and ( ii ) a
network task .
training -training kernel matrix representing similarities
15. One or more non - transitory computer storage media across the training inputs within the training data set ;
encoded with instructions that, when executed by one or generating a final updated validation training kernel
more computers, cause the one or more computers to per matrix and a final updated training -training kernel
form operations for predicting performance of a neural matrix , the generating comprising, at each of N steps:
network , wherein the neural network has a set of network sampling a set of parameter values for the set of
parameters, the operation comprising: network parameters from an initialization distribu
receiving a training data set having a set of training tion,
samples , each of the training samples having a training generating, from the training inputs, training feature
input and a corresponding training output; vectors using the neural network in accordance with
receiving a validation data set having a set of validation the sampled parameter values of the network param
pairs , each of the validation pairs having a validation eters , generating, from the validation inputs, valida
input and a corresponding validation output; tion feature vectors using the neural network in
initializing ( i ) a validation training kernel matrix represent accordance with the sampled parameter values of the
ing similarities of the validation inputs in the validation data network parameters , and
set and the training inputs in the training data set and (ii ) a updating the validation - training kernel matrix and the
training -training kernel matrix representing similarities training -training kernel matrix using the validation
across the training inputs within the training data set ; feature vectors and the training feature vectors ;
generating a final updated validation - training kernel performing the following operations at least once :
matrix and a final updated training - training kernel generating predicted validation outputs for the valida
matrix , the generating comprising, at each of N steps : tion inputs using the final updated training - training
sampling a set of parameter values for the set of kernel matrix , the final updated validation - training
network parameters from an initialization distribu kernel matrix , and the training outputs in the training
tion, data set, and
US 2022/0019856 A1 Jan. 20 , 2022
9

updating an accuracy score of the neural network based 19. The system of claim 16 , wherein the operations for
on the predicted validation outputs and the validation generating the predicted validation outputs using the final
outputs in the validation data set ; and updated training -training kernel matrix , the final updated
outputting the updated accuracy score as a final accuracy validation -training kernel matrix , and the training outputs in
score that represents performance of the neural net the training data set comprise:
work . generating a regularization constant representing random
17. The system of claim 16 , wherein the operations for noise applied to the training - training kernel matrix ,
generating , from the training inputs and the validation generating a regularized training - training kernel matrix
inputs, the training feature vectors and the validation feature using the regularization constant and the final updated
vectors comprise: training - training kernel matrix , and
generating, for each training input, a respective training
feature vector using the feature extraction neural net generating the predicted validation outputs using the
work in accordance with the sampled values of the regularized training - training kernel matrix , the final
network parameters , and updated validation - training kernel matrix , and the train
generating, for each validation input, a respective valida ing outputs in the training data set.
tion feature vector using the feature extraction neural 20. The system of claim 16 , wherein the operations for
network in accordance with the sampled values of the updating the accuracy score of the neural network based on
network parameters. the predicted validation outputs and the validation outputs in
18. The system of claim 16 , wherein the operations for the validation training data set comprise:
updating the validation - training kernel matrix and the train generating a candidate updated accuracy score based on a
ing- training kernel matrix using the validation feature vec number of correctly predicted validation outputs, and
tors and the training feature vectors comprise : generating the updated accuracy score by selecting the
updating the validation - training kernel matrix using the maximum of the current accuracy score and the can
validation feature vectors and the training feature vec didate updated accuracy score divided by the number of
tors, and
updating the training -training kernel matrix using the validation pairs in the validation dataset.
training feature vectors .

You might also like