A Survey of Graph Prompting Methods

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Theme Article: Graph Learning, Prompt Learning, Survey

A Survey of Graph Prompting Methods:


Techniques, Applications, and Challenges
Xuansheng Wu∗ , University of Georgia, Athens, GA, 30605, USA
Kaixiong Zhou∗ , Rice University, Houston, TX, 77005, USA
Mingchen Sun, Jilin University, Changchun, Jilin, 130015, China
Xin Wang, Jilin University, Changchun, Jilin, 130015, China
Ninghao Liu, University of Georgia, Athens, GA, 30605, USA
arXiv:2303.07275v2 [cs.LG] 31 May 2023

Abstract—The recent “pre-train, prompt, predict” training paradigm has gained


popularity as a way to learn generalizable models with limited labeled data. The
approach involves using a pre-trained model and a prompting function that applies a
template to input samples, adding indicative context and reformulating target tasks
as the pre-training task. However, the design of prompts could be a challenging
and time-consuming process in complex tasks. The limitation can be addressed by
using graph data, as graphs serve as structured knowledge repositories by explicitly
modeling the interaction between entities. In this survey, we review prompting
methods from the graph perspective, where prompting functions are augmented
with graph knowledge. In particular, we introduce the basic concepts of graph
prompt learning, organize the existing work of designing graph prompting functions,
and describe their applications and future challenges. This survey will bridge the gap
between graphs and prompt design to facilitate future methodology development.

Following the supervised training paradigm, the suc- gineering towards simpler data engineering. Under the
cess of machine learning relies on a massive labeled new training paradigm of “pre-train, prompt, predict”,
dataset to learn the patterns between data samples the prompt reformulates the target task looking similar
and manual annotations for a specific task. However, it to a pretext one to effectively reuse the pre-trained
requires expensive costs to annotate the large dataset. model.3,4 Considering a triplet classification task with
The lack of labeled dataset has promoted researchers an example <Rose, Color, Red>, a prompting function
to investigate the “pre-train, fine-tune” framework.2 The could format it into the sentence “The Color of Rose
core idea is to leverage task-agnostic information to is [MASK] .” and asks a language model to fill the
create pretext tasks for initializing the model, and then masked blank with a color word. In this case, the
fine-tune it over the target task with less labeled sam- language model pre-trained with predicting masked
ples. The knowledge learned from extensive pretext words, known as Masked Language Modeling (MLM)
datasets improves model’s generalization on the target task,5 could be applied directly to the reformulated
problem. Under the “pre-train, fine-tune” framework, problem. Such simplicity and efficiency have moti-
researchers focus on designing pretext task objectives vated the development of suitable prompting methods,
and tailoring them to correlate with the target problem. including both manually and automatically designed
Such engineering requires expert experience and time- prompt templates that concatenate input samples with
consuming trials to find informative pretext objectives. language words or tokens.
Recently, prompt-based learning has enabled re- Early studies on prompt design often relied on
searchers to move away from expensive objective en- expertise and trial-and-error to create intuitive tem-
plates for a broad range of target problems, which may
inject noises and weaken generalization performance.
XXXX-XXX © 2021 IEEE Recently, some researches have suggested leveraging
Digital Object Identifier 10.1109/XXX.0000.0000000 graph data (e.g., external knowledge graphs or task-

Month Published by the IEEE Computer Society Publication Name 1


THEME/FEATURE/DEPARTMENT

dependent graphs) to induce precise contextual knowl- provided an overview of prompt learning in recommen-
edge and differentiate between input samples. Graphs dation systems.4 In contrast, this paper focuses on re-
are ubiquitous in science and industry domains, such viewing studies that adopt the prompt learning method
as social media networks, molecular graphs in bio- on graph data, no matter leveraging a graph knowledge
chemical informatics, knowledge graphs in natural lan- base to improve the design of prompt templates or
guage processing, and user-item interaction graphs applying the prompting methods to graph analysis.
in recommender systems.1 The relational information
serves as a knowledge base by storing features of
nodes and edges and modeling their interactions, upon Graph Learning: From Traditional
which one is easy to infer the related knowledge to fill Paradigms to Prompting Methods
in the prompt template. For example, based on Color’s
feature definition and Rose’s neighborhood structure, Notations
one could reformulate a more indicative prompt as: We denote sets with calligraphic capital letters (e.g.,
“What is the Color (with elementary colors of Red, D), matrices with boldface capital letters (e.g., Z), vec-
Yellow, Blue) of Rose, including species of Darcey tors with boldface lowercase letters (e.g., v). We also
and Chrysler Imperial Roses? [MASK] ” This exter- denote a dataset as D = {(x, y)}, where x and y are in-
nal knowledge from graph systems contains human put sample and ground-truth information, respectively.
experience or prior statistics to tailor the prompts for In this work, we focus on summarizing the prompting
different input samples. function designs based on graphs or the prompting
In this survey, we review the rapidly growing area of methods used in graph machine learning. Hence input
prompt-based learning from a new perspective, where sample x can be instantiated as a sentence in NLP, a
the prompts are generated upon graphs or designed relational triplet in knowledge graph, a node of complex
for graph-related tasks. We conduct a timely overview networks or biochemical graph. Let G = (V , E ) denote
of state-of-the-art algorithms and applications in the the graph, where V and E are the sets of nodes and
field of graph-based prompt learning. The intended edges, respectively. Each edge e ∈ E is described by a
audiences include machine learning researchers in- triplet (vh , r , vt ), vh , vt ∈ V are the head and tail nodes,
vestigating how to leverage graph knowledge base to and r is the relation between the nodes. In vanilla case,
improve the design of prompt templates or applying the edge could be simplified as e = (vh , vt ), and the
the prompting methods to graph analysis. The contri- relation is represented by a scalar edge weight aht ∈ ℜ.
butions of this survey are summarized as follows:
› Compared with existing surveys that introduce
general semantic-based prompts, we provide a
Traditional Learning Paradigms
Supervised learning.
unique overview of leveraging graph structures to
inject adaptive knowledge into prompt design. To Typically, the supervised learning is formalized as:
the best of our knowledge, this is the first review X
min L(f (x; θ); y), (1)
of graph prompting methods. θ
(x,y)∈D
› We organize the graph prompting methods into
two categories, including discrete prompt design where f (x; θ) is model prediction based on input x
and continuous prompt design, which differ in how and learnable parameters θ. L is loss function mea-
to design prompt templates and reformulate the suring the differences between the model prediction
input samples. and ground truth y, such as cross-entropy or absolute
› We summarize the state-of-the-art graph prompt- difference loss.
ing methods for various applications, including In NLP tasks, input x is usually instantiated by text,
graph machine learning, recommender systems, and ground truth y can be a discrete label or a textual
and natural language processing. We further dis- tag. For example, in text classification, we take input
cuss the limitations of existing pre-trained models x = “It is absolutely a great product.” and generate label
and graph prompting methods, which sheds lights y from {Positive, Negative}. In text generation tasks,
on the future researches. we are interested in question x = “How do you evaluate
this item?” and predict answer y as “Absolutely great”.
Comparison to Related Surveys. There are limited In graph learning tasks, input x can be instantiated
surveys covering this newborn and growing rapidly by a node, edge or graph. For example, the knowledge
topic. Liu et al. conducted the first review on prompt graph completion considers edge as input x =<Rose,
learning for natural language processing.3 They also Color, Red> and predicts whether this relational triplet

2 Publication Title Month 2021


THEME/FEATURE/DEPARTMENT

TABLE 1. Notations in graph prompting methods.

Name Notation Example


Input x Entity pair <Rose, Red>.
Output y Relation label <Color>.
Pre-trained Model f (·) A pre-trained language model or graph model.
Template T The relation between [x1 ] and [x2 ] is [MASK].
Prompt Addition x ′ = fprompt (x, T ) The relation between <Rose> and <Red> is [MASK].
Answer Set A Possible answers {<Color>,<Shape>,<Height>} to fill [MASK].
Predicted Answer ẑ The predicted answer <Color> to appear within the context.

exists; the node/graph classification task takes a target task is not necessarily related to the target problem
node/graph as x and tries to predict the target label. (e.g., domain shifting), one may need to take more fine-
tuning epochs to adapt model and even obtains poor
Pre-train and Fine-tune
generation performance. To address the challenge, a
Training a supervised model requires massive high-
new paradigm of “pre-train, prompt, predict” has been
quality labeled data. However, manually labeling sam-
proposed to free the requirements of both massive
ples is expensive in many real-world scenarios.
supervised data and pretext task engineering. In par-
To tackle this challenge, the “pre-train, fine-tune”
ticular, the prompt concept is introduced to augment
paradigm proposes to introduce a pre-training stage to
the target input with more informative descriptions
obtain transferable parameters of the model and then
and reformulates it looking similar to the pre-training
fine-tune it on the target dataset.
data. In this way, the pre-trained model can be directly
reused to generate the desired output by behaving the
Pre-training. The pre-training stage aims to learn
same way as the pretext tasks. We mathematically
generalizable knowledge from a massive task-agnostic
describe the graph prompt learning in how to design
dataset De and avoids training from scratch given target
the prompt template and leverage pre-trained model to
tasks. The objective of pre-training is defined as:
X conduct downstream tasks.
min Lpre (f (xe; θ); ye), (2)
θ
(e
x ,e
y )∈D
e
Prompt Addition. The prompting function fprompt in-
where Lpre denotes the loss function of pretext task, corporates task-related knowledge θprompt into input
xe and ye are the constructed input sample and label, sample x to generate a prompted input:
respectively. The pre-training task is usually designed
x ′ = fprompt (x; θprompt ), (3)
in a self-supervised manner to utilize the large amount
of unlabeled data. For example, in pre-training lan- where x ′ lies on the same space as pre-training
guage models, xe can be a text with some tokens samples xe. Thus, the pre-trained model can take
being masked, and ye denotes those masked tokens the prompted input x ′ to conduct pre-training tasks.
to be recovered. In the graph learning tasks, xe can Depending on the task-related knowledge parameters
be a subgraph of G where some graph components θprompt , the graph prompt can be categorized into two
(e.g., nodes or edges) are masked, and ye denotes the types: discrete and continuous ones.
masked components for reconstruction.
• The discrete prompt represents knowledge param-
eters θprompt as the natural language words or in-
Fine-tuning. The fine-tuning stage adapts pre-trained
dividual nodes/edges, and puts them together with
model f on the target task according to Eq. (1). The dif-
input data. For example, considering the triplet com-
ference is model parameters θ are initialized based on
pletion task in knowledge graphs, we could define
the pre-training results. Practically, researchers could
θprompt as a manually designed template T =“The
develop an additional architecture component (e.g.,
[r ] of [vh ] is [MASK].”, where [MASK] is the
prediction head) on top of the pre-trained model to uti-
masked token to be filled by the pre-trained language
lize the pre-training knowledge for the target problem.
model. Given an incomplete triplet input x =“<Rose,
Color, [?]>” without the tail node [vt ] , we could
Graph Prompt Learning generate the prompted input as x ′ =“The Color of
One of the major issues in “pre-train, fine-tune” is the Rose is [MASK].” Template T could be expanded by
costly engineering of pretext task designs. If the pretext concatenating the neighbors of input triplets based

Month 2021 Publication Title 3


THEME/FEATURE/DEPARTMENT

on the underlying graph database, e.g., T =“The


[r ] of [vh ] is [MASK], where [vh ] is related to
Graph Prompt Learning
[NEIGHBORS].” For example, “The Color of Rose is
[MASK], where Rose is related to species of Darcey
and Chrysler Imperial.” Discrete Continuous
Prompt Design Prompt Design
• The continuous prompt uses the differential vectors
to represent knowledge parameters θprompt , which
Manual Node Topology Graph Node Topology
are prepended to input data in the continuous em- Design Level Level Irrelevant Level Level
bedding space. Compared with the discrete ver-
sion, the continuous prompt relaxes the constraint
FIGURE 1. A taxonomy of graph prompt learning techniques.
of human-interpretable semantic words or physical
neighbors, and allows the prompt related parame-
ters θprompt to be optimized in an end-to-end man- prompt, node-level prompt, and topology-level prompt.
ner. Based on the graph database, the continu- More specifically, the manual prompt constructs the
ous prompt could leverage the graph representation template according to intuitive human knowledge and
learning approaches, like graph neural networks, to applies them to graph applications. While the node-
obtain the structure-aware knowledge parameters. level prompt utilizes the knowledge of a single node
(e.g., node attributes and types), the topology-level
Answer Search and Mapping. Based on the input prompt uses the relationships among multiple nodes
prompt, the pre-trained model is leveraged to generate (e.g., motifs) to formulate the graph prompt.
the desired output. Typically, we define an answer
set Z containing all the possible target labels. We
then search over Z to generate the desired output Discrete Graph Prompt
maximizing the possibility or similarity function P: The design of prompting function could be motivated
z = argmaxz ′ ∈Z P(f (x ′ ), z ′ ). (4) by graphs. The graph-induced discrete prompting func-
tion fdisc constructs a description of the input graph to
In NLP or knowledge graph completion tasks, introduce graph knowledge into prompts. Specifically,
where an language model is adopted to predict the it generates graph-related textual knowledge and align
sample classes, answer set Z could be a small them with manual textual templates to enrich the input.
subset of language words. Each candidate answer Formally, the graph-induced discrete prompting func-
z ′ fills the masked prompt and estimates the cor- tion is given by:
responding possibility. For example, in the case of
triplet completion for color prediction, we can define x ′ = fprompt (x; G, T ), (5)
Z = {“Red”, “Blue”, “Yellow”} as the possible colors
where x has a corresponding node v from the graph G,
that any instance from the knowledge graph can have.
and θprompt = {G, T } are the parameters of the prompt-
In the product review classification problem, Z =
ing function. The prompting function could either be de-
{Positive, Negative} are mapped to the binary ratings.
signed manually or induced by the graph G. The latter
In the general graph learning tasks where the data
can be further divided into two categories, according
samples are not associated with language description,
to the type of graph knowledge they explored, namely
answer set Z can be defined as a set of trainable class
node information and graph topology.
vectors. We measure similarity between the output
f (x ′ ) of pre-trained model (e.g., node representation
vectors of GNNs) and the class vectors to obtain the Manual Prompt
target labels. A natural way to obtain a discrete template T is man-
ually designing the template regardless of the graph.
The manual prompting function creates prompted sam-
Graph Prompt Design
ple according to the human knowledge in downstream
Graph prompting methods rely heavily on the prompt-
tasks. Formally, the prompting function is defined as:
ing function as defined in Eq. (3). In the rest of this pa-
per, we systematically review the existing approaches x ′ = fprompt (x; T ), (6)
in two categories: discrete and continuous graph
prompts. Within each category, we further introduce where T is the pre-defined prompt template and fprompt
the graph prompt from three aspects, namely manual inserts the input x to each slot from the template T .

4 Publication Title Month 2021


THEME/FEATURE/DEPARTMENT

The concrete design of manual template T could In generative tasks, prompts could be formulated
vary in different tasks or applications, where the key to ask for information from models. For example, in
is to reformat the downstream task as one of the pre- explainable recommender systems, the graph nodes
training tasks. For example, in recommender systems, include users U and items I , and the edge set E shows
where the model is designed for predicting the user’s user-item interactions. PEPLER suggests incorporat-
preferences towards items, the P5 model manually ing user and item attributes as prompts to generate
design the prompt template as T =“What star rating textual explanations,11 where the verbalizing function
do you think [USER] will give to [ITEM]? [MASK]”, gverb is implemented as a retrieval function returning
where [USER] and [ITEM] are slots for filling the in- key features related to both the user u ∈ U and item
put user-item pair.7 Since this manual template shows i ∈ I . To be specific, if the commonly mentioned words
that the task is a rating prediction, we can expect a by a user are {“bed”, “restaurant”, “service”} and the
pre-trained language model to conduct this task by commonly mentioned words to a hotel are {“location”,
filling [MASK] with a number. In knowledge extrac- “restaurant”, “service”}, the prompted input of this user-
tion tasks, RelationPrompt applies prompt learning for item pair would be “restaurant service [MASK] ”. Then
knowledge extraction by manually design a template a pre-trained language model is asked to generate a
as T =“Context: [x]. Head Entity: [MASK]. Tail Entity: piece of text at [MASK] to explain the user preference
[MASK]. Relation: [MASK].”, where [x] is the slot for to the item in terms of “restaurant” and “service”,
the input text.8 Given an input text x =“Their grandson such as “The restaurant of this hotel provided me a
was Captain Nicolas”, RelationPrompt expect a pre- wonderful dinner service.”
trained language model can fill the three [MASK] slots
with “Nicolas”, “Captain”, and “Military”, respectively.
For the biomedical entity normalization task, Graph-
Topology-level Prompt
Compared with node attributes that describe the local
Prompt fills an input entity x into a human designed
information of graph components, topological struc-
template T =“[MASK] is identical with [x]” to predict
tures of graphs carry broader and more diverse knowl-
the synonym entity ŷ of the input x.9
edge. Graph topology structures elicit specific prompts
that not only focus on describing the edges and
Node-level Prompt neighbors within a sub-graph, but also enable wider
The adopted node information could vary for differ- perspectives on sampling a sub-graph when provided
ent tasks, but it typically describes the properties of with a sample. The corresponding prompting function
vertices.10 A key step in the existing studies is verbaliz- can be defined as:
ing the node information. Thus, the discrete prompting
function is written as: x ′ = fprompt (x; gverb (G),
e T ),
(8)
x ′ = fprompt (x; gverb (x; G), T ), (7) G
e = gsample (x; G),

where gverb is a non-parametric function describing the where gsample selects a sub-graph G e ⊂ G around the
node attributes correlated to the input x with natural center node correlated to the input x, and gverb is a
language, and the prompting function fprompt fills the non-parametric function describing the sub-graph G e in
template T with input x and the verbalized attributes. natural language words. We introduce different graph
Some concrete examples are provided below. sampling methods for designing the prompting func-
In knowledge graph completion tasks, where the tion as follows, including edge sampling, neighborhood
node set V is a named entity set, the edge set E sampling, and path sampling.
is the entity relation set, and a piece of knowledge Edges are fundamental graph components to aug-
is represented as a triplet (u, e, v ), u, v ∈ V , and ment prompting functions. For example, in event
e ∈ E . The goal is to judge whether a triplet (u, e, v ) is causality detection, after mapping input tokens into
valid by converting it and its supporting information into node entities in a KG, a set of triples can be
textual sentences,12 which will be fed into pre-trained extracted from the KG to provide common-sense
language models for classification later. Specifically, knowledge to describe events.17 In relation extrac-
the attributes of the nodes u and v are the supporting tion, KGs provide rich ontology information of en-
information of the triplet, and the verbalizing function tities to help infer their relations.15 Taking the in-
gverb formats these information into sentences. For ex- put Bill Gates, co-founder of Microsoft. as an example,
ample, given a node u =Lebron James, the verbalizing OntoPrompt first matches two named entity Bill Gates
function gverb returns the supportiing information as: and Microsoft with “person” and “organization” on the
“Lebron James: American basketball player.” KG, respectively, then identifies them as “entrepreneur”

Month 2021 Publication Title 5


THEME/FEATURE/DEPARTMENT

and “company” according to KG ontology information, continuous templates, which will then be concatenated
and finally retrieves “leader of” as the relation to fill the with the embeddings of the input sample. Specifically,
prompt via the verbalizing function gverb . the continues prompting function could be written as:
Neighborhood sampling facilitates prompt formula-
x′ = fprompt (x; G, T), (9)
tion in providing context information of the target node.
For example, in sequential recommender systems, by where x denotes the continuous representation of the
inducing user shopping history as graph topological in- input sample x, x′ denotes the continuous representa-
formation to obtain prompts. LMRecSys makes the first tion to the prompted input x ′ , and T ∈ ℜn×d denotes
attempt to exploit the pre-trained language models for a trainable matrix. Here, n and d serve as hyper-
movie recommendation,13 where the sampling function parameters, and θprompt = {G, T} are the parameters
gsample selects a few of the latest watched movies of the of the prompting function. Different from the discrete
user, and the verbalizing function gverb uses the movie template T , the continuous template T is usually ob-
titles as the descriptions of these selected movies. tained by optimizing it with some training samples. In
Given a user to the LMRecSys, the prompting func- this subsection, we summarize research according to
tion returns “A user watched Raiders of the Lost Ark, different levels of graph components they adopted in
Star Wars: Ep VI-Return of the Jedi, Ran. The user continue prompting functions, including graph irrele-
will also watch [MASK].”, where the template T =“A vant components, node attribute, and graph topology.
user watched [x1 ], [x2 ], [x3 ]. The user will also watch
[MASK].”, and the movie names are generated by
Manual Prompt
the verbalizing and sampling functions. Besides this
To help the pre-trained model use extra parameters to
study, M6-Rec designs a fine-grained verbalizing func-
understand the downstream task, a simple approach
tion gverb to describe detailed features of the user-
is to concatenate the continuous template with input:
item interactions,14 such as the category of the clicked
movie and when did the user click each movie. One x′ = [x; T], (10)
example of the prompted input from M6-Rec could be
where [·; ·] is the concatenate operation. Since the
“A user watched Raiders of the Lost Ark 14 days ago,
number of trainable parameters is small, we could
Star Wars: Ep VI-Return of the Jedi 4 days ago, Ran
sufficiently learn the continuous template T with a small
15 minutes ago. The user will also watch [MASK].”
set of training data D.
Path sampling provides another way to gather
This idea has been applied to pre-train graph neural
graph information in broader range for prompt design.
networks, where task-specific prompt templates are
GraphPrompt induces the topology structures of a
learned to provide graph-level transformation on the
protein graph G for the biomedical entity normalization
downstream graphs during inference without tuning
task,9 where each node in the protein-protein interac-
the parameters of the pre-trained GNN model.18 It
tion graph represent a protein, and each edge repre-
points out that the learned continuous template T,
sents the affiliations between two proteins. To predict
also known as the Graph Prompt Feature (GPF), can
the synonym of a given protein name, GraphPrompt
implicitly modify the node features and graph structure.
constructs zero-order to second-order prompts of each
The experiments demonstrate that this approach can
candidate protein on G . Here, the sampling function
achieve comparable performance to fine-tuning, with a
gsample forms the sub-graph by collecting the K -hops
minimal amount (≈ 0.1%) of tunable parameters.
neighbors of the center node, and the verbalizing
function gverb walks through paths from the center node
to each K -hops neighbor and concatenates the textual Node-level Prompt
definitions of nodes and edges on the path as output. Learning prompt embeddings defined at Eq. (10) does
not take into account the input x or its corresponding
node information in graph G. The most straightforward
Continuous Graph Prompt way to develop a graph-related prompt is to gather
Discrete templates describe the graph knowledge with node x’s embeddings:
words from a finite vocabulary set, which may not
x′ = [x; T⊤ goh (x)], (11)
obtain the optimal representations of graph connec-
tivity information. In contrast, continuous representa- where the number of rows at continuous matrix T is
tions can capture the knowledge more accurately and determined by the size of node set V , and goh (x) :
comprehensively. The continues prompting function ℜ → {0, 1}|V| returns an one-hot vector indicating the
represents the graph knowledge as embeddings within node index of input x.

6 Publication Title Month 2021


THEME/FEATURE/DEPARTMENT

L. Lei et al. apply this strategy for explainable have special physical meanings and they are named
recommendation,11 where they define graph G as a as motif M = (V e , Ee), where V
e ⊂ V and Ee ⊂ E .
bipartite graph with users and items. Given a user- For example, network motif mining yields insights in
item pair as input x = (u, i), the corresponding node analyzing cellular signalling systems, and biochemical
embeddings (eu , ei ) are collected. They consider these motifs are the functional elements of molecules. Thus,
node embeddings into the continuous prompts and it is valuable to inject the informative motifs to input
input them into a pre-trained language model for gen- sample x as prompt. Assuming we totally collect m
erating explainable recommendations. unique motifs from the target application, the continu-
ous prompt is formatted as:
Topology-level Prompt x′ = [x; T⊤ gmotif (x, G)], (13)
Encoding topology information with continuous repre-
sentations motivates the diverse prompt designs. We where gmotif : G → {0, 1}m is a motif detector returning
consider several types of topological information in the a m-length binary vector, and T ∈ ℜm×d denotes the
graph prompt learning, which includes ontology, motifs trainable embeddings for each motif structure.
and sub-graph structures. The sub-structure analysis widely occurs in molec-
ular representation learning, where input sample x is
Ontology Embedding. Recall that a knowledge graph a molecular graph and the motifs are some functional
containing instances V and ontology V ′ as the node groups, such as a benzene ring. MolCPT is one of
sets, a step-forward strategy to align the knowledge the classic method in this topic.21 Specifically, it first
graph to the input x ∈ V is querying its ontology node introduces human experts to define a set of key motif
xe ∈ V ′ . Here, the ontology of a node represents the structures, and then designs motif detector gmotif to
type of the node, introducing more human understand- search them from the input graph. To this end, the
ing to an entity. For example, if “Bill Gates” is the input prompted input x′ contains the extra knowledge about
node x, one of its ontology node xe could be “Person”. motif structures to determine the underlying functions
We formalize the continuous prompt indicated with of physiological and biophysical molecules.
ontology embedding as:
Message Passing. To fully explore the topology and
x′ = [x; T⊤ gont (x, G)], (12) neighbor features centered at input node v , graph
where gont is a function finding the ontology node xe of neural networks can be applied to gather information
the input node x from the graph G, and the number of from its k-hop neighbors:
rows at continuous template T is given by the number x′ = [x; ggnn (G;
e θgnn )], (14)
of ontology.
Promoting input x by using its corresponding on- where G e refers to the k-hop sub-graph of input node
tology embedding has been discussed in knowledge v extracted from the whole graph, and ggnn is a graph
extraction under the zero/few-shot(s) settings. The neural network with trainable parameter θgnn .
reason is the rich prior ontology is critical to infer Conversational recommendation task is one of the
the semantic information of input sample. For ex- scenarios that needs the powerful representation abil-
ample, if we know that the subject “Hamilton” is a ity of graph neural networks because of its complex
person and the object “British” is a country, the pre- user-item interactions. The node set V of graph G
diction probabilities on the candidate relations irrele- consists of the items and their attributes, and the edge
vant to person/location will be significantly weakened between an item node and an attribute node indicates
(e.g., “birth_of_organization” is an impossible relation). the belonging relation. UniCRS applies the message-
KnowPrompt first discusses combining the ontology passing-based prompt learning into this scenario.22
knowledge of the subject-object pair for relation ex- Particularlly, during a conversation, when an user men-
traction tasks.19 OntoPrompt extends this idea to a tions an item i ∈ V , UniCRS collects a k-hop sub-graph
wider range of application scenarios,15 including event Ge i ∈ G, and passes it to GNN encoder ggnn to obtain
extraction and knowledge graph completion. Moreover, a d-dimensional node embedding for each node at
OAG-BERT brings this idea during the pre-training, the sub-graph G e i . Finally, these dynamically generated
so that the continuous template T could be obtained node embeddings are considered as the prompts to
without downstream dataset.20 generate the final response for the user.

Motif Embedding. Furthermore, the frequently ap- Sub-Graph Pre-trained Embedding. While the
pearing sub-structures of a graph dataset typically message-passing-based approach (Eq. 14) exhibits a

Month 2021 Publication Title 7


THEME/FEATURE/DEPARTMENT

strong ability to learn representations for sub-graph G,


e prompting methods align additional knowledge, typi-
training its parameter θgnn from scratch necessitates cally stored in another knowledge graph, with words
a substantial amount of training data. To relax the and phrases to the input raw text. For example, the
requirement of training samples, one parametric- phrase “Bill Gates” belongs to the “Person” ontol-
efficient way is first aligning a pre-trained model to ogy and “Microsoft” is an entity of the “Company”
initialize the representations of nodes: ontology.15 These additional information help the mod-
els make more precise prediction.12,15,19 In addition,
x′ = [x; gptm (Ge)], (15) recommender systems also attract studies in graph
prompt learning due to several reasons: (1) Users,
where gptm is a pre-trained model with frozen weights, items, and their interactions naturally form a graph,
which generates embeddings for nodes from Ge. where the node set is the users and items, and the
Community question answering task aims to an- edge set can be various types of user-items inter-
swer a use question by using the resources created actions (e.g., shopping, clicking, commenting, visit-
by other users from the same community. This task ing, favoring, and so on). (2) Many components in
perfectly fits in aligning pre-trained language models recommendation systems (e.g., names of users and
since the concepts from the community graph G = items, authors of items, categories of items) can
(V , R) are usually texts. Here, the node set includes be represented with natural language, simplifying the
articles, comments, and questions, and the edges designs of discriminate prompts. More specifically,
between the nodes are defined in some natural (e.g., a graph prompt learning has been applied to enhance
comment to an article) or weak-supervised (e.g., BM25 the recommendation systems for various purposes,
measures a pair of similar article and question) ways. such as interpretability,7,11 multi-tasks learning,7 se-
Prefix-HeteroQA was the first attempt in this path.23 quential recommendation,13,14,26 and conversational
Given a question x and its most similar question q recommendation.22
from the community, Prefix-HeteroQA searches k-hops
neighbors of the question q to form a sub-graph G. e
Eq. (15) is finally applied to obtain the embeddings Natural Language Processing
of each document from the sub-graph as prompts to Text Understanding. Text understanding could refer
enhance the input question x. to various natural language tasks, such as sentiment
analysis, topic classification, named entity recognition,
and retrieve-based question answering. Traditionally,
Applications these tasks are usually addressed by using the input
text, as it can be challenging to represent the context
Graph Learning and background knowledge contained in other formats
Homogeneous Graphs. Homogeneous graphs are (such as knowledge graphs) using language models.
typically encountered in pure graph learning tasks, However, recent studies found that prompt learning
such as node classification, link prediction, or graph could overcome this challenge without redesigning a
classification tasks. In contrast with the traditional su- language model. Many of them reformat the extra
pervised training process, these methods align pre- knowledge as part of the textual context and append
trained graph models with prompt learning. Specifically, them as the additional information to the original input
they design prompts to reformulate the downstream text during the inference stage.23,25 Other research
task (e.g., node/graph classification) as the pretext incorporates graph prompting during the pre-training
task (e.g., link prediction) for training the pre-trained of the language model, enabling the models to in-
models, which can reduce the training objective gap herently integrate graph information. One example of
between the constructed pretext and dedicated down- this is OAG-BERT,20 which is designed for the topic
stream tasks.24,27,28 classification task over the academic articles. During
its pre-training, the input content is assigned both word
Heterogeneous Graphs. Heterogeneous graphs are embeddings and category embeddings, such as title,
prevalent in many real-world applications, including author names, and abstract, which help OAG-BERT to
knowledge graphs, user-item interaction graphs, and better understand the context’s ontology.
social networks. Knowledge extraction is one of the
most common applications for knowledge graphs, Text Generation. Text generation is much challenging
which extracts entities from a piece of input text and than the text understanding, as it requires the model
predicts the relations among these entities. Graph to generate choose words from the whole vocabulary

8 Publication Title Month 2021


THEME/FEATURE/DEPARTMENT

set to represent its understanding and responses to mapping functions that can release the potentials of
the input text. Similar to the text understanding tasks, pre-trained graph neural networks.
graph prompting methods could provide more context
and additional knowledge to the generator as a better Explanation and Fairness. Given the inherent nature
guidance to the generation process. For example, H. of graph-structured data in capturing explicit relation-
Liu and Y. Qin propose to enhance the context of an ships between nodes, there is a growing interest in
input question for better answer generation by align- leveraging this structured side information to develop
ing a community content graph,23 where the content explainable and fair systems. For instance, in the con-
graph consists of articles, comments, questions, and text of biomedical entity normalization, the use of a
answers from other users on this community. Although biomedical entities graph allows for a clear pathway
most of these graph prompting methods are introduced in the entity normalization process, facilitating human
during the inference,11,14,22,23,26 there are still some experts in validating the outputs of the model.9 Further-
researchers proposed to prompting graph knowledge more, other applications driven by the need for expla-
started from the pre-training stage.7 nation can also benefit from incorporating additional
graph-structured data to enhance their performance
through prompt learning, such as developing explain-
Challenges able recommender systems.7,11
Pre-training GNN Models. In light of the huge success
of pre-trained models in text mining, many tasks in
graph related domains (e.g., knowledge graph extrac- Conclusion
tion and recommendation) are transformed into NLP In this survey, we have explored the emerging learning
problems. However, a GNN-based model instead of paradigm of “pre-train, prompt, predict” from graph-
the language model could better encode the structure related learning scenarios. Our analysis highlights
knowledge of graph database. Recently, researchers the essential role of graphs as structured knowledge
have begun to design the more general graph pre- repositories that enable adaptive prompt templates and
training objectives to improve the generalization of pre- generalization of pre-trained models to downstream
trained GNNs for better prompt learning.27,28,29 scenarios with limited labeled data. We have identified
two types of graph prompting designs: discrete and
continuous, each with their own strengths and limita-
Knowledge Injection Answer Mapping. Current meth-
tions. Our survey has demonstrated the applications
ods that rely on manually designed rules incorpo-
of these techniques in graph representation learning
rate structured information into answer mapping func-
and natural language processing. We conclude that
tions. However, this approach introduces a strong
graph prompting functions have significant potential
bias from the designers and results in low efficiency.
to enhance the generalization of graph applications
Some attempts have been made in the field of knowl-
and personalize prompt templates in NLP. However,
edge extraction, where they inject extra knowledge
we have also identified several challenges. Overall,
graphs to automatically optimize the answer mapping
our study provides insights and directions for future
function.19,25 We encourage future works to broaden
research in this exciting area.
this idea to wider applications.

Non-Generative Answer Mapping. Different from the


pre-training objectives in NLP and CV, pre-training
objectives of graph neural networks usually refer to
discriminative tasks. The form of downstream tasks
that can be supported is constrained by the output
of these discriminative pre-training tasks. For exam-
ple, predicting whether an edge is exist or not is a
typical pre-training objective to pre-train graph neural
networks, where the output is a probability ranged
between 0 to 1. To compare, the output space of pre-
trained NLP models usually is the vocabulary set, while
that of pre-trained CV models could be an image.
Therefore, it is challenging to design sufficient answer

Month 2021 Publication Title 9


THEME/FEATURE/DEPARTMENT

Systems: Evaluations and Limitations," in I (Still) Can’t


REFERENCES
Believe It’s Not Better! NeurIPS 2021 Workshop, 2021.
1. X. Feng, et al., "Graph learning: A survey," IEEE 14. Z. Cui, et al., "M6-Rec: Generative pre-trained Lan-
Transactions on Artificial Intelligence 2.2, 109-127, guage Models are Open-Ended Recommender Sys-
2021. tems," arXiv preprint, arXiv:2205.08084, 2022.
2. Y. Dong, L. Deng, and G. Dahl, "Roles of pre-training 15. H. Ye, et al., "Ontology-enhanced Prompt-tuning for
and fine-tuning in context-dependent DBN-HMMs for Few-shot Learning," in Proceedings of the ACM Web
real-world speech recognition," in Proc. NIPS Work- Conference: WWW 2022, pp. 778-787, 2022.
shop on Deep Learning and Unsupervised Feature 16. J. Zhou, Q. Zhang, Q. Chen, L. He, and X. Huang,
Learning, sn, 2010. "A Multi-Format Transfer Learning Model for Event
3. P. Liu, et al., "Pre-train, prompt, and predict: A system- Argument Extraction via Variational Information Bot-
atic survey of prompting methods in natural language tleneck," Proceedings of the International Conference
processing," ACM Computing Surveys 55, no. 9, 1-35, on Computational Linguistics, pp. 1990-2000, 2022.
2023. 17. J. Liu, et al., "KEPT: Knowledge Enhanced Prompt
4. P. Liu, L. Zhang, and J. A. Gulla, "Pre-train, prompt Tuning for event causality identification," Knowledge-
and recommendation: A comprehensive survey of Based Systems 259, p. 110064, 2023.
language modelling paradigm adaptations in recom- 18. T. Fang, Y. Zhang, Y. Yang, and C. Wang, "Prompt
mender systems," arXiv preprint, arXiv:2302.03735, Tuning for Graph Neural Networks," arXiv preprint,
2023. arXiv:2209.15240, 2022.
5. K. Jacob, D. M.W. Chang, and L. K. Toutanova, "BERT: 19. X. Chen, et al., "Knowprompt: Knowledge-aware
Pre-training of Deep Bidirectional Transformers for prompt-tuning with synergistic optimization for relation
Language Understanding," in Proceedings of NAACL- extraction," in Proceedings of the ACM Web Confer-
HLT, pp. 4171-4186, 2019. ence: WWW 2022, pp. 2778-2788, 2022.
6. W. Hu, et al., "Strategies for pre-training graph neural 20. X. Liu, et al., "Oag-bert: Pre-train heterogeneous
networks." in International Conference on Learning entity-augmented academic language models," Pro-
Representations, 2020. ceedings of the 28th ACM SIGKDD Conference on
7. S. Geng, et al., "Recommendation as language pro- Knowledge Discovery and Data Mining, pp. 3418–
cessing (rlp): A unified pretrain, personalized prompt 3428, 2022.
& predict paradigm (p5)," in Proceedings of the 16th 21. C. Diao, K. Zhou, X. Huang, and X. Hu, "MolCPT:
ACM Conference on Recommender Systems, pp. 299- Molecule Continuous Prompt Tuning to Generalize
315, 2022. Molecular Representation Learning," arXiv preprint,
8. Y. K. Chia, L. Bing, S. Poria, and L. Si, "Relation- arXiv:2212.10614, 2022.
Prompt: Leveraging Prompts to Generate Synthetic 22. X. Wang, K. Zhou, J. Wen, and W. X. Zhao, "Towards
Data for Zero-Shot Relation Triplet Extraction," in Find- Unified Conversational Recommender Systems via
ings of the Association for Computational Linguistics: Knowledge-Enhanced Prompt Learning," in Proceed-
ACL, pp. 45-57, 2022. ings of the 28th ACM SIGKDD Conference on Knowl-
9. J. Zhang, et al., "GraphPrompt: Biomedical Entity edge Discovery and Data Mining: KDD, pp. 1929-
Normalization Using Graph-based Prompt Templates," 1937, 2022.
arXiv preprint arXiv:2112.03002, 2021. 23. H. Liu, and Y. Qin, "Heterogeneous graph prompt
10. B. Ryan, et al., "Improving Language Model Predic- for Community Question Answering," Concurrency
tions via Prompts Enriched with Knowledge Graphs," and Computation: Practice and Experience, p. e7156,
in Workshop on Deep Learning for Knowledge Graphs 2022.
(DL4KG@ ISWC2022), 2022. 24. M. Sun, K. Zhou, X. He, Y. Wang, and X. Wang.
11. L. Lei, Y. Zhang, and L. Chen, "Personalized prompt "Gppt: Graph pre-training and prompt tuning to gen-
learning for explainable recommendation," ACM Trans- eralize graph neural networks," in Proceedings of the
actions on Information Systems 41, no. 4, pp. 1-26, 28th ACM SIGKDD Conference on Knowledge Discov-
2023. ery and Data Mining: KDD, pp. 1717-1727, 2022.
12. X. Lv, et al., "Do pre-trained models benefit knowl- 25. S. Hu, et al., "Knowledgeable prompt-tuning: Incor-
edge graph completion? a reliable evaluation and a porating knowledge into prompt verbalizer for text
reasonable approach," in Findings of the Association classification," arXiv preprint, arXiv:2108.02035, 2021.
for Computational Linguistics: ACL, pp. 3570-3581, 26. D. Sileo, W. Vossen, and R. Raymaekers, "Zero-
2022. Shot Recommendation as Language Modeling," in Ad-
13. Y. Zhang, et al., "Language Models as Recommender vances in Information Retrieval: 44th European Con-

10 Publication Title Month 2021


THEME/FEATURE/DEPARTMENT

ference on IR Research, ECIR, Stavanger, Norway, Mingchen Sun is a Master student at Jilin University,
April 10–14, 2022, Proceedings, Part II, pp. 223-230, China. His interests include Meta Learning, Graph Data
Cham: Springer International Publishing, 2022. Mining, and Domain Generalization.
27. Y. Zhu, J. Guo, and S. Tang, "SGL-PT: A Strong
Graph Learner with Graph Prompt Tuning," arXiv
preprint, arXiv:2302.12449, 2023. Xin Wang Ph.D., associate professor at Jilin Uni-
28. W. Zhang, et al., "Structure Pretraining and Prompt versity, China. He is also a senior member of CCF.
Tuning for Knowledge Graph Transfer," arXiv preprint, His main research interests include machine learning,
arXiv:2303.03922, 2023. information retrieval, and social computing.
29. C. Zhang, R. Chen, X. Zhao, Q. Han, and L. Li., "De-
noising and Prompt-Tuning for Multi-Behavior Recom- Ninghao Liu is an assistant professor in the School of
mendation," arXiv preprint, arXiv:2302.05862, 2023. Computing at the University of Georgia, Athens, GA,
30. Bubeck, Sébastien, et al., "Sparks of artificial gen- USA. His research interests are Explainable AI (XAI),
eral intelligence: Early experiments with gpt-4,", arXiv Natural Language Processing, and Graph Mining. Con-
preprint, arXiv:2303.12712, 2023. tact him at ninghao.liu@uga.edu.

Xuansheng Wu currently is a second-year Ph.D. stu-


dent in the School of Computing at the University
of Georgia, Athens, GA, USA. His research inter-
ests broadly cover Natural Language Processing, Rec-
ommendation Systems, and Representation Learning.
Contact him at xuansheng.wu@uga.edu.

Kaixiong Zhou received the BS degree in electrical


engineering and information science from Sun Yat-Sen
University, and the MS degree in electrical engineering
and information science from the University of Science
and Technology of China. He is currently working to-
ward the PhD degree with the Department of Computer
Science, Rice University. His research interests include
large-scale graph machine learning and its applications
in science. Contact him at Kaixiong.Zhou@rice.edu.

Month 2021 Publication Title 11

You might also like