Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

1

NetGPT: An AI-Native Network Architecture for


Provisioning Beyond Personalized Generative
Services
Yuxuan Chen, Rongpeng Li, Zhifeng Zhao, Chenghui Peng, Jianjun Wu, Ekram Hossain, and Honggang Zhang

Abstract—Large language models (LLMs) have triggered (NLP) and computer vision tasks. However, fine-tuning is still
tremendous success to empower our daily life by generative a perquisite to align pre-trained LLMs to follow human intents
arXiv:2307.06148v2 [cs.LG] 23 Jul 2023

information. The personalization of LLMs could further con- [2] and yield personalized outputs. Therefore, it might be cost-
tribute to their applications due to better alignment with human
intents. Towards personalized generative services, a collaborative ineffective to deploy LLMs on the centralized clouds only,
cloud-edge methodology is promising, as it facilitates the effective as it inevitably requires multiple copies of complete model
orchestration of heterogeneous distributed communication and parameters to support different purposes.
computing resources. In this article, after discussing the pros and In order to boost the personalization of LLMs, a collabora-
cons of several candidate cloud-edge collaboration techniques, we tive cloud-edge methodology is essential [3]. Compared to the
put forward NetGPT to capably synergize appropriate LLMs
at the edge and the cloud based on their computing capacity. cloud-only LLM deployment, such a cloud-edge collaboration
In addition, edge LLMs could efficiently leverage location- enjoys multi-folded merits. Firstly, it provides more freedom
based information for personalized prompt completion, thus to allow edge servers to deploy various fine-tuned LLMs and
benefiting the interaction with the cloud LLM. After deploying adapt to environmental differences, thus making the service
representative open-source LLMs (e.g., GPT-2-base model and personalization and customization possible. Meanwhile, it
LLaMA model) at the edge and the cloud, we present the
feasibility of NetGPT on the basis of low-rank adaptation- contributes to bridging data-abundant generative devices with
based light-weight fine-tuning. Subsequently, we highlight the more adjacent servers. Therefore, it could reduce the latency
essential changes required for an artificial intelligence (AI)- and save the communication overhead to upload data to more
native network architecture towards NetGPT, with emphasis on remote cloud servers. Incorporating generative LLMs into the
deeper integration of communications and computing resources edge networks promises to facilitate the effective utilization of
and careful calibration of logical AI workflow. Furthermore,
we demonstrate several benefits of NetGPT, which come as communication and computing (C&C) resources.
by-products, as the edge LLMs’ capability to predict trends As illustrated in Fig. 1, there are several distinctive ways
and infer intents promises a unified solution for intelligent to implement the cloud-edge collaboration for deployment of
network management & orchestration. We argue that NetGPT LLMs (e.g., local fine-tuning, model splitting). Specifically,
is a promising AI-native network architecture for provisioning by offloading cloud-trained LLMs, local edge servers tailor
beyond personalized generative services.
the cloud-trained LLMs to accommodate personalized and
customized services based on the user preference and spec-
I. I NTRODUCTION ified scenarios. In this regard, federated learning or parallel
With the remarkable success of deep learning spanning from training can be leveraged as a complement to facilitate the
decision-making in AlphaGo to human-level interaction like tuning [2], [4]. However, such an approach might face severe
ChatGPT, it is anticipated that artificial intelligence (AI) will implementation issues in practice, as repetitive fine-tuning
be embodied in 6G networks. Along with the enhanced edge of complete LLMs implies significant computational burden,
computing capabilities, AI could benefit effective orchestration and also distributed deployment of proprietary LLMs might
of network resources and improve the quality of service (QoS). raise intellectual property concerns from model developers.
Correspondingly, investigation on efficient AI-based service Meanwhile, force-fitting an entire LLM on edge possibly
provisioning has attracted an intense research interest. On strains the limited computing resources of edge servers and
the other hand, application of one AI model is often limited makes the cost of edge computing unacceptable, while of-
to certain scenarios or tasks. In this context, large language floading the LLMs incurs significant communication overhead.
models (LLMs) (e.g., generative pre-trained transformer, GPT Alternatively, splitting LLMs to cloud and edge servers [4],
[1]) could perform well in various natural language processing by deploying some layers of large-scale deep neural network
(DNNs) at the edge while leaving the remaining layers to the
Y. Chen and R. Li are with Zhejiang University, Hangzhou 310027, China, cloud, can effectively balance the disproportionate comput-
(email: {cyx00, lirongpeng}@zju.edu.cn). ing resources of edge and cloud servers. Within the model
C. Peng and J. Wu are with Huawei Technologies Co., Ltd., Shanghai
210026, China (email: {pengchenghui,wujianjun}@huawei.com). splitting, how to effectively partition the DNNs between the
Z. Zhao and H. Zhang are with Zhejaing Lab, Hangzhou 310012, edge and the cloud belongs to one of the most challenging
China as well as Zhejiang University, Hangzhou 310027, China (email: issues, as it should minimize the end-to-end latency while
{zhaozf,honggangzhang}@zhejianglab.com).
E. Hossain is with University of Manitoba, Winnipeg, Manitoba, Canada maintaining a sufficiently small model size for the edge servers
(email: ekram.hossain@umanitoba.ca). [4]. Such a model partitioning can be even more intricate,
2

Cloud
Cloud Cloud Cloud

Edge
Computing Plane
Offload & fine-tune Splitting Synergy User Plane

Control Plane

Terminal
Edge Edge Edge Edge Edge Edge
(a) LLM fine-tuning (b) LLM Splitting (c) LLM Synergy

Fig. 1. An illustration of candidate means to realize the could-edge collaboration for NetGPT.

given billions of parameters in a typical LLM. Moreover, the that NetGPT allows the utilization of other LLMs as well. On
wide adoption of residual links in LLMs possibly limits the this basis, we delve into implementation details of cloud-edge
choice of suitable partitioning points. Besides, the LLMs might LLM synergy towards NetGPT in an incremental manner.
leak private details from the data for training [5]. In other In particular, we start with a quick and general overview of
words, it might be challenging to directly adopt both local transformer, the basis of LLMs, and enumerate detailed DNN
fine-tuning and model splitting as an implementation means structures of two LLMs (i.e., LLaMA-7B model and GPT-
of collaborative cloud-edge methodology. 2-base model). Then, we discuss the effective means to fine-
In this article, we put forward NetGPT that aims to respect tune these LLMs on computation-limited devices, and manifest
the cloud-edge resource imbalance and synergize different the effectiveness for location-based personalized generative
sizes of functional LLMs at the edge and cloud. Specifically, services.
in apparent contrast to AI-exogenous network with decoupled
C&C resources, NetGPT could leverage converged C&C to A. An Overview of Transformer
deploy smaller edge LLMs for the edge while larger one
for the cloud, and meaningfully realize collaborative cloud- Transformer has been extensively adopted as the foun-
edge computing to provision personalized content genera- dation model to constitute a multi-layer decoder in LLMs.
tion services. Besides, NetGPT incorporates a logical AI The fundamental concept behind the transformer involves
workflow that could be developed to determine performance- the construction of the DNN structure through the use of
consistent communication links. For example, in NetGPT, the multi-layer self-attention and feed-forward neural networks
performance-driven communication link could terminate at the (FNNs). In particular, self-attention relies on an attention
edge to accelerate the response assuming the availability of head parameterized by query, key and value matrices (i.e.,
satisfactory edge LLM-induced content. Otherwise, inspired Q, K and V ) to compute the internal correlation within an
by the idea of prompt learning [6], the LLMs at the edge input sequence, by deriving and assigning different weights
can infer the context and actively append (or fill in) some to different positions within the sequence. Meanwhile, the
local or personalized information, so as to acquire a more FNN applies a non-linear transformation to the representa-
comprehensive result at the cloud. Furthermore, as a by- tion of each position. Additionally, techniques such as layer
product, the edge LLMs contribute to a unified solution normalization are employed to mitigate the issue of vanishing
for intelligent network management & orchestration (e.g., gradients.
user intent inference and popularity prediction). Therefore,
NetGPT coincides with the trend to deeply integrate C&C, B. DNN Structure of LLMs at the Edge and Cloud
and represents an LLM-empowered AI-native architecture. 1) DNN structure of GPT-2-base model: The GPT-2-base
model, which is the smallest version of the GPT-2 series,
II. I MPLEMENTATION S HOWCASE OF N ET GPT encompasses 12 stacked layers of the original transformer
As illustrated in Fig. 2, we present a synergistic cloud-edge structure (i.e., an 8-head self-attention sublayer and an FNN
framework to accomplish personalized generative services, by sublayer). A fixed absolute positional encoding of sine and co-
leveraging distinctive pre-trained LLMs for cloud and edge sine positions is employed to pre-transform the input sequence.
(e.g., base stations [BSs]) deployment. In particular, limited In addition, GPT-2 leverages a rectified linear unit (ReLU)
by the availability of open-source LLMs, we select and deploy activation function (i.e., fReLU (x) = max(0, x)). Due to its
the LLaMA-7B model [7] and the GPT-2-base model, which relatively exceptional performance and minimal computational
consist of approximately 6.7 and 0.1 billion parameters, at the requirement, it can be appropriate to be deployed at the
cloud and the edge, respectively. However, it should be noted network edge.
3

Terminal Edge Cloud


Outputs
Comprehensive prompt
+
Layer + RMSNorm
Normalization softmax
O
+ Linear
Linear dout
× Linear RMSNorm
Feed Forward r
Neural Network Softmax SwiGLU
Modified LoRA
12 × Transformer Block
Layer ×
× r
Normalization
Casual ···
RoPE mask Linear Modified din
+ RoPE
Transformer Block
Masked Q K V + Dropout
Self-attention
RMSNorm Block outputs Input Embedding
Input Embedding
···

Comprehensive prompt
Fine-tuned Concise prompt Block inputs Modified Transformer Block
GPT-2-base Responsive LLaMA-7B

Propose Add local information and


Provide answers based on the prompt provided at the edge of the network
destination extend to complete prompt

Fig. 2. A framework of collaborative cloud-edge computing towards NetGPT.

2) DNN structure of LLaMA model: LLaMA, which is information. Thereby, RoPE could align with intuitive under-
trained on a large set of unlabeled data and is ideal for standing and it exhibits superior performance in practice.
fine-tuning for downstream tasks, features various parameter
versions as well [7]. Compared to GPT-3, LLaMA incorporates C. Fine-Tuning Techniques
several specific enhancements to maintain similar performance
1) Low-rank adaptation-based light-weight fine-tuning: As
while significantly reducing the number of parameters [7].
LLaMA lacks the capability to generate responsive text [7], an
For example, in order to enhance training stability, LLaMA
extra fine-tuning is still required. However, a direct fine-tuning
normalizes the input of each sub-layer instead of normalizing
of LLMs such as a LLaMA still requires significant compu-
the output. Moreover, it adopts the root mean square layer
tational resources. For example, it demands 112 GB video
normalization (RMSNorm) function [8] as a simplified re-
random access memory (VRAM) to fine-tune the LLaMA-7B
placement for layer normalization, by employing the root mean
model, far more than the capacity of NVIDIA A100 Tensor
square (RMS) rather than the standard deviation. Additionally,
Core GPU. Therefore, we leverage a low-rank adaptation
RMSNorm introduces a learnable scaling factor that enables
(LoRA) technique [11] to achieve parameter-efficient fine-
adaptive feature scaling. Thus, it contributes to improving
tuning on a consumer-level hardware. In particular, in order
normalization effects for diverse features that have distinctive
to fine-tune a complete parameter matrix W ∈ Rdin ×dout ,
value ranges. Secondly, LLaMA replaces the ReLU activation
LoRA specially adds a bypass pathway to simulate the matrix
function with Swish-gated linear unit (SwiGLU) [9], which
update ∆W by using two downstream parameter matrices
combines the Swish function (i.e., fSwish (x) = x · σ(βx)
A ∈ Rdin ×r and B ∈ Rr×dout with the intrinsic rank r. In
with σ(x) = 1+e1−x and a trainable parameter β) with GLU
other words, under the condition that r ≪ min(din , dout ),
(i.e., fGLU (x) = x · σ(W x + b) with trainable parameters W
LoRA successfully transforms large parameter matrix ∆W
and b), thereby possibly activating neurons according to the
into lower-rank dimensional ones with ∆W ≈ AB. Our
input in a more selective manner and imparting smoothness
experiment shows that it only costs 28 GB VRAM to fine-
to effectively capture intricate non-linear relationships. Lastly,
tune the LLaMA-7B model, without significantly elongating
LLaMA introduces rotary position embedding (RoPE) [10],
the training duration. Additionally, the required storage space
which encodes positional information with a pre-defined rota-
for fine-tuning could be greatly reduced from 12.55 GB to
tion matrix and naturally incorporates explicit relative position
50 MB1 . On the basis of LoRA, we can utilize the Stanford
dependency in the self-attention formulation. Compared to
Alpaca dataset [12] to fine-tune LLaMA-7B model and obtain
absolute position encoding which assigns a distinct encoded
a qualified responsive LLaMA-7B model.
representation to each position in the sequence, the taken form
of relative position encoding in RoPE enables a more effective 1 Such statistics are obtained under the configuration that r = 8 and a
modeling of long-range dependencies within the contextual learning rate-related scalar factor equals 16.
4

2) LLM-instructed data collection: In order to implement LLM models, computing resources shall be consistently sched-
personalized edge LLM, it is crucial to grant the GPT-2- uled to accomplish NetGPT. In other words, as shown in
base model the capability to extend a “concise” prompt by Fig. 4, NetGPT necessitates the design of deeply converged
appending location-based information. Basically, the position- C&C in radio access networks (RANs). On top of these novel
ing information can be conveniently obtained according to features, a logical AI workflow shall be devised to establish
the locations of affiliated BSs stored in the 5G access and (beyond) generative service orchestration.
mobility management function (AMF). Meanwhile, in order
to complement more comprehensive information, we take A. Converged C&C in RAN
the self-instruct approach [13] and leverage OpenAI’s Text- In order to effectively organize heterogeneous resources,
Davinci-003 model to generate useful text samples. In partic- which may simultaneously cover terrestrial and non-terrestrial
ular, as for each location, we use a set of manually-written communications, the RAN for NetGPT has to first provide
location-related prompts to interact with the OpenAI’s Text- a control plane (CP) for seamless connectivity control to
Davinci-003 model, and leverage the generative response texts ubiquitously support prompting and generative information
as the “comprehensive prompt”. Correspondingly, a series of transmission in the user plane (UP). Such elements can
mappings between the “concise prompt” and a “comprehensive be developed in accordance with the 5G and 5G-Advanced
prompt” can be collected. Considering the size and task techniques. In addition, it is worthwhile to introduce an
complexity of the edge LLM, we collect a dataset comprising independent computing plane (CmP) to coordinate computing
approximately 4, 000 samples for directly fine-tuning the GPT- resources and perform AI-related functionalities, so as to
2-base model towards a prompt-completion model. Notably, facilitate the deployment and update of generative services.
for scenarios where stronger generality is required, edge LLMs
can be enhanced with a larger-scale LLM, and fine-tuning B. Data Processing and Privacy Protection
techniques such as LoRA can be employed as well. As discussed in Section II, data processing (e.g., data
collection and fine-tuning) is heavily leveraged to lay the very
D. Performance Showcase foundation for producing generative LLM models. Besides
collecting and storing data, it is essential to introduce data
Fig. 3 demonstrates the performance of NetGPT. In par- desensitization modules as key data processing services, so
ticular, as illustrated in Fig. 3, the edge LLM is capable as to avoid privacy risks and protect the privacy embedded
to complement the “concise prompt” according to the chart in the data. Meanwhile, data policy enforcement modules,
at the left part of Fig. 3, which highlights most frequently which handle data according to regulatory as well as non-
used words for generating each corresponding “comprehensive regulatory rules (e.g., geographic restrictions), will be executed
prompt”. Furthermore, different BSs add location-based per- by default to ensure the integrity and legitimation of data
sonalized information so as to satisfy distinctive requirements. processing. Moreover, contingent on the regulation and data
Subsequently, the edge LLM processes the user-submitted usage policy, it is also feasible to devise some data processing
“concise prompt”, and feeds the complemented prompt to the model libraries and expose the capabilities with appropriate
cloud. Next, a more complete generative response could be access control for entities to utilize the data services.
anticipated. It can be observed from the right part of Fig. 3 that
NetGPT could yield different location-based responses, which C. Personalized Profiling
manifests the capability to handle personalized generative In order to create a highly customized NetGPT, location-
services through effective cloud-edge synergy. oriented profiling shall be significantly enhanced to support
the definition and operation of personalized generative AI
III. AI-NATIVE N ETWORK A RCHITECTURE T OWARDS services. For example, local venue and facility information can
N ET GPT be specially gathered to train edge LLMs. On the other hand,
We aruge that NetGPT provides the opportunity to trans- user service nodes (USN) can contain customized services
form cellular networks into an AI-native networking architec- at end-user level as well, so as to meet diversified customer
ture, which provisions personalized, networked and inclusive requirements. Meanwhile, it could further support to establish
intelligence for end users and grants users more privilege the user profiling and characterize connected terminals.
to access generative services anytime and anywhere. Nev-
ertheless, such a transformation does come at a cost. It D. C&C Resource Management
requires substantial changes, far more than installing racks of As part of provisioned services in future cellular net-
servers at the edge location and local break-out of the traffic works, the orchestration of resources for NetGPT shares
for edge processing. In particular, compared with conven- some similarities as that for other network services, including
tional connectivity-oriented communications systems, wherein establishing connectivity and computing resource allocation.
a typical service establishes connections between two specific However, it also poses additional challenges, since the scope
terminals and the communication source and destination are of resources spans distributed nodes from the cloud to the
clearly defined by end users, NetGPT requires to establish terminal. Therefore, novel protocol stack needs be carried
generative performance-driven connections in a more implicit on radio control signaling (e.g., RRC) or radio data protocol
manner. Moreover, as NetGPT involves more frequent data (e.g. PDCP or SDAP) to transmit AI-generative messages and
collection and processing modules for training personalized implement model update & distribution.
5

Providing a response with prompt: Providing a response with prompt:


The Brooklyn Public Library (BPL) is The New York Public Library, located at
located at 10 Grand Army Plaza, 42nd Street and Fifth Avenue in
Brooklyn, NY 11238. The BPL offers a Manhattan. It has a lending policy that
wide range of services and collections, allows patrons to borrow up to 10 books
including books, magazines, at a time for a period of 3 weeks. The
newspapers, DVDs, CDs, audiobooks, library has a collection of over 50
and eBooks. million items.

Extend to complete prompt: Extend to complete prompt:


Find libraries in Brooklyn and provide Recommend libraries in Manhattan and
information about their collections and provide information about their lending
services. policies and reading collections.

Region Brooklyn Region Manhattan

User prompting:
Libraries

Fig. 3. A performance showcase of NetGPT for personalized generative services with edge LLM-assisted prompt completion.

CmP 6
Data processing Profiling AI model
Cleansing Transformation Consumer Preferences SpeechT5 GPT-2
Storage Desensitization Venues and Facilities LayoutLM BLIP
Governance and Security Network-wide Config. Stable Diffusion ···
5 7 8
UP Core Network
Data Prompting Cache Validation Prompting Enhancement
4 delivery Data Collection & Transmission Generative Text ··· 9 C&C Coherent Workflow for
10 Edge-Cloud Collaboration:
CP 1. Terminal Authentication
2
C&C resource managemnet 2. Resource Orchestration
Access Authentication Security Measures Resource Allocation 3. Signaling Configuration
control Audit Logging Mobility Management Resource Scheduling ··· 4. Concise Prompt or Training Data
1 5. Concise Prompt Request & Processing
Terminal 6. Calling Edge LLM
3 7. Comprehensive Prompt Generation
8. Calling Cloud LLM
9. Complete Response Generation
Access Network 10. Response Transmission

Fig. 4. The illustration of an AI-Native Network Architecture and Logical AI Workflow for NetGPT.

E. Logical AI Workflow As discussed in Section IV lately, the logical AI workflow


significantly contributes to the improvement of QoS in a more
In order to effectively provision AI services, it is critical to customizable manner.
develop some logical AI workflows to parse and orchestrate
NetGPT services. Notably, a logical AI workflow, which IV. LLM-BASED U NIFIED S OLUTION FOR N ETWORK
facilitates a set of network functions physically distributed at M ANAGEMENT & O RCHESTRATION
both the edge and the cloud to coherently deliver “concise Apart from providing personalized generative services,
prompt”, “comprehensive prompt” and “generative responses”, NetGPT and the AI-native architecture could provide a unified
regulates data processing and profiling to train personalized solution for intelligent network management & orchestration,
LLMs at the CmP. Furthermore, logical AI workflows are on top of deployed edge LLMs.
mapped to physical resources during service deployment, so as
to take into account the QoS requirements of related services. A. Popularity Prediction
Notably, as the workflow covers a wide scope of network Popularity prediction could significantly contribute to im-
functions, the processing may be serial or directed acyclic proving networking efficiency by adapting the C&C resources
graph-based, and thus involves comprehensive optimization to predicted demands [14]. Considering the underlying prin-
techniques beyond the scope of this article. On the other hand, ciples of DNN structure, GPT-2 promises the ability to in-
the logical AI workflow is not limited to generative services. terprets user’ preferences from historical visiting records from
6

Fine-tune Phase Evaluation Phase


Training examples for label: positive GPT-2-BASE
Prompt template
<START>In interval 0 movie Iron Man 2 <NEXT> appear:1<END> Transformer Block 12
<START>In interval [n] movie [title] <NEXT>
Feed Forward Neural Network
<START>In interval 1 movie Sing <NEXT> appear:1<END>
Masked Self-attention
··· Generated content
···
Training examples for label: negative List of [appear:0/1 <END>]
<START>In interval 1 movie I am mother <NEXT> appear:0<END>

<START>In interval 358 movie Baywatch <NEXT> appear:0<END>


···
··· Transformer Block 1
Feed Forward Neural Network

Masked Self-attention
Dataset of Netflix audience behavior (.csv file)

Label
List of [appear:0/1 <END>]

Fig. 5. Edge LLM for popularity prediction: From data-sample template, fine-tuning to prediction accuracy.

affiliated terminals at the RAN. Furthermore, by incorporating


Please set a connection between My connection point is Access Access two to
location-specific data, the edge LLM can be rather different to Cloud point two and Access point three and I need the connection to cloud two, 2Gbps,
better capture personalized characteristics unique to each area. one, with 10Gbps bandwidth and Cloud one, bandwidth is 5Gbps, Traffic protection is
Traffic protection. traffic protection isn't needed. mandatory.
In order to testify the prediction capability of the edge LLM
(i.e., the GPT-2-base model), we take the Netflix audience
Bandwidth 10Gbps 5Gbps 2Gbps
behavior dataset as a showcase. In order to mitigate the data Fine-tuned Cloud point Cloud two Cloud one Cloud two
GPT-2-base Access point Access one Access three Access two
sparsity, the range of time is first divided into intervals based Traffic protection Yes No Yes
on a 6-hour cycle and tagged a number. Subsequently, 20
movies with the highest frequency are selected and labeled
according to the presence or absence of each movie in a partic-
ular interval. Later, benefiting from the data format capability
in CmP, the related historical information is converted into
some natural languages conforming to a specific template. Fig. 6. Edge LLM for intent inference.
For example, “In interval 1, movie ‘Iron man 2’ appear :1”
indicates the movie “Iron man 2” appears in the Interval 1,
which corresponds to some specific date-time given in the
left-bottom part of Fig. 5. Meanwhile, special tokens are networks and reacting to network issues [15]. In that regard,
added to create a prompt template that aids the language how to precisely understand the intent of customers and
model in information comprehension and response generation. translate it into feasible network configuration belongs to one
After direct fine-tuning, the edge LLM could generate labels of the most fundamental issues. Coincidentally, edge LLMs
following the prompt template format, i.e., whether the movie exactly satisfy such intent-recognition process [15] and benefit
appears under the interval. Furthermore, to enhance model the accurate understanding of more verbal statements. In par-
universality, we specifically utilize data from the last half year ticular, by adopting a similar fine-tuning approach as before,
in the dataset for experimentation, and divide the dataset as there emerges the capability in the edge LLMs to understand
the training set and test set according to the proportion 95% to and extract keywords from arbitrary natural language input.
5%. Fig. 5 finally presents the prediction accuracy of the edge Notably, such a fine-tuning can be easily accomplished, as
LLM. It can be observed that GPT-2 exhibits an acceptable only 4, 000 input-keywords pairs are used in our experiment.
level of accuracy on this task, and significantly outperforms Fig. 6 presents the corresponding capability of GPT-2-base to
other classical algorithms (e.g., LSTM, GRU). identify customer intent. It can be observed from Fig. 6 that if
one user wants to establish a 10 Gbps connection from Access
1 to Cloud 2 with traffic protection, accurate keywords could
B. Intent Inference be conveniently extracted by GPT-2-base model regardless of
Intent-based networking aims to tackle the increased diffi- statement distinctions. Therefore, it avoids the cumbersome
culty of applying template-based services to vertical business, template design and customer learning process. In this regard,
and it needs to perceive the real-time requirements of cus- compared to conventional NLP tools, LLMs manifest stronger
tomers before replacing the manual processes of configuring capability towards fulfilling intent-driven networking.
7

V. C ONCLUSION [5] N. Kandpal, E. Wallace, et al., “Deduplicating training data mitigates


privacy risks in language models,” in Proc. ICML 2022, Baltimore, USA,
In this article, based on LLMs, we have advocated an AI- Jun. 2022.
native network architecture, namely NetGPT, for provisioning [6] P. Liu, W. Yuan, et al., “Pre-train, prompt, and predict: A systematic
network services beyond personalized generative content. In survey of prompting methods in natural language processing,” ACM
Comput. Surv., vol. 55, no. 9, pp. 195:1–195:35, Jan. 2023.
particular, through the effective cloud-edge LLM synergy, we [7] H. Touvron, T. Lavril, et al., “LLaMA: Open and efficient foundation
have demonstrated the feasibility of NetGPT for location- language models,” Feb. 2023, arXiv:2302.13971 [cs]. [Online].
based personalized services by deploying some representative Available: http://arxiv.org/abs/2302.13971
[8] B. Zhang and R. Sennrich, “Root mean square layer normalization,” in
open-source LLMs (e.g., GPT-2-base model and LLaMA Proc. NeurIPS 2019, Vancouver, Canada, Dec. 2019.
model) at the edge and the cloud, and evaluating their coherent [9] N. Shazeer, “GLU variants improve transformer,” Feb. 2020,
performance with the adoption of low-rank adaptation-based arXiv:2002.05202 [cs, stat]. [Online]. Available: http://arxiv.org/
abs/2002.05202
parameter-efficient fine-tuning techniques. On top of that, [10] J. Su, Y. Lu, et al., “RoFormer: Enhanced transformer with rotary
we have highlighted some substantial architectural changes position embedding,” Aug. 2022, arXiv:2104.09864 [cs]. [Online].
(e.g., deep C&C integration and a logical AI workflow) that Available: http://arxiv.org/abs/2104.09864
[11] E. J. Hu, Y. Shen, et al., “LoRA: Low-rank adaptation of large language
NetGPT will require. As a by-product, we have presented models,” in Proc. ICLR 2022, Virtual Edition, Jan. 2022.
a possible unified AI solution for network management & [12] R. Taori, I. Gulrajani, et al., “Stanford alpaca: An instruction-following
orchestration empowered by edge LLMs through exemplifying llama model,” https://github.com/tatsu-lab/stanford alpaca, 2023.
[13] Y. Wang, Y. Kordi, et al., “Self-instruct: Aligning language model with
the performance for popularity prediction and intent inference. self generated instructions,” in Proc. ACL 2023, Toronto, Canada, Jul.
While NetGPT is a promising AI-native network architec- 2023.
ture for provisioning beyond personalized generative services, [14] J. Zhu, R. Li, et al., “AoI-based temporal attention graph neural network
for popularity prediction and content caching,” IEEE Trans. Cogn.
in this article, we have not discussed all of the major research Commun. Netw., vol. 9, no. 2, pp. 345–358, 2023.
challenges. For successful deployment of NetGPT, the fol- [15] L. Pang, C. Yang, et al., “A survey on intent-driven networks,” IEEE
lowing questions will need to be answered. Access, vol. 8, pp. 22 862–22 873, 2020.

• Given the success of LLaMA to shrink model sizes


through effective algorithmic and structural updates, how AUTHOR B IOGRAPHIES
to implement the inference and fine-tuning at the termi- Yuxuan Chen is a PhD Candidate in Zhejiang University,
nals, so as to satisfy the limited computing capability in Hangzhou, China. His research interests currently focus on
cost-limited devices? Large Language Model in communication.
• Considering the continual evolution of knowledge, how to Rongpeng Li is an associate professor in Zhejiang Univer-
emulate new Bing2 and implement online learning-based sity, Hangzhou, China. His research interests currently focus
LLMs to adapt to the dynamicity of wireless environment on networked intelligence for communications evolving.
at the edge? Zhifeng Zhao is the Chief Engineer with Zhejiang Lab,
• Due to the limited sensitivity for numerical inference and Hangzhou, China. His research area includes collective intel-
possible deception effects, how to further improve the ligence and software-defined networks.
rigorousness of LLMs and what lessons can be learned Chenghui Peng is a Principal Researcher of Huawei Tech-
from the latest LLM? Meanwhile, how to incorporate the nologies. His current research interests focus on 6G native AI
evaluation metric of LLM to derive a suitable logical AI architecture design.
workflow? Jianjun Wu is the Chief Researcher and Director of Future
• Besides network optimization at higher layers, how to Network Lab, Huawei Technologies. He is leading the future
develop LLM-based physical and radio link layers to network architecture design in Huawei.
unleash the power of LLMs? How to leverage the LLMs Ekram Hossain is a Professor and the Associate Head
to meet the requirements for low-latency and ultra- (Graduate Studies) in the Department of Electrical and Com-
reliable communications? Also, how to optimize wireless puter Engineering at University of Manitoba, Canada.
communications systems for efficient deployment and Honggang Zhang is a principal researcher with Zhejiang
operation of LLMs in future networks? Lab, Hangzhou, China. He is currently involved in research
on cognitive green communications.
R EFERENCES
[1] OpenAI, “GPT-4 technical report,” Mar. 2023. [Online]. Available:
http://arxiv.org/abs/2303.08774
[2] J. Zhang, S. Vahidian, et al., “Towards building the federated GPT:
Federated instruction tuning,” May 2023, arXiv:2305.05644 [cs, eess].
[Online]. Available: http://arxiv.org/abs/2305.05644
[3] M. Xu, H. Du, et al., “Unleashing the power of edge-cloud generative
AI in mobile networks: a survey of AIGC services,” Mar. 2023,
arXiv:2303.16129 [cs]. [Online]. Available: http://arxiv.org/abs/2303.
16129
[4] W. Wu, M. Li, et al., “Split learning over wireless networks: Parallel
design and resource management,” IEEE J. Sel. Area. Comm., vol. 41,
no. 4, pp. 1051–1066, Apr. 2023.
2 New Bing refers to GPT-empowered search engine available at https://
www.bing.com/new.

You might also like