Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

1

Causal Reasoning: Charting a Revolutionary Course


for Next-Generation AI-Native Wireless Networks
Christo Kurisummoottil Thomas, Member, IEEE, Christina Chaccour, Member, IEEE,
Walid Saad, Fellow, IEEE, Mérouane Debbah, Fellow, IEEE, and Choong Seon Hong, Senior Member, IEEE

Abstract—Despite the basic premise that next-generation wire- intelligence (AI) across the protocol stack of wireless systems
less networks (e.g., 6G) will be artificial intelligence (AI)-native, to is no longer just a theoretical concept—it is becoming a
arXiv:2309.13223v1 [cs.IT] 23 Sep 2023

date, most existing efforts remain either qualitative or incremen- tangible reality [1]. AI-native wireless systems must leverage
tal extensions to existing “AI for wireless” paradigms. Indeed,
creating AI-native wireless networks faces significant techni- machine learning (ML) and AI algorithms to design, optimize,
cal challenges due to the limitations of data-driven, training- and operate various aspects of the wireless system, including
intensive AI. These limitations include the black-box nature of transceiver design, resource allocation, interference manage-
the AI models, their curve-fitting nature, which can limit their ment, spectrum sharing, and more. However, developing AI
ability to reason and adapt, their reliance on large amounts native 6G wireless systems necessitates the design of novel
of training data, and the energy inefficiency of large neural
networks. In response to these limitations, this article presents AI frameworks that are tailored to several unique challenges
a comprehensive, forward-looking vision that addresses these of wireless systems: 1) dynamic adaptability to ensure rapid
shortcomings by introducing a novel framework for building AI- adjustments to changing network conditions, user demands,
native wireless networks; grounded in the emerging field of causal and other environmental factors; 2) time criticality, whereby
reasoning. Causal reasoning, founded on causal discovery, causal 6G systems must deliver ultra-low latency and unwavering
representation learning, and causal inference, can help build
explainable, reasoning-aware, and sustainable wireless networks. reliability, particularly for applications demanding split-second
Embracing such a human-like AI foundation can revolutionize responsiveness; 3) intent management, enabling networks to
the design of AI-native wireless networks, laying the foundations autonomously translate high-level business intents into net-
for creating self-sustaining networks that ensure uninterrupted work configurations in a closed-loop fashion, ensuring in-
connectivity. Towards fulfilling this vision, we first highlight tent assurance across the network while maintaining overall
several wireless networking challenges that can be addressed
by causal discovery and representation, including ultra-reliable network reliability; 4) resilience, enabling 6G networks to
beamforming for terahertz (THz) systems, near-accurate physical withstand disruptions and maintain connectivity even in chal-
twin modeling for digital twins, training data augmentation, and lenging scenarios; 5) non-linear signal dynamics, that must be
semantic communication. We showcase how incorporating causal properly modeled to accurately capture the time-varying nature
discovery can assist in achieving dynamic adaptability, resilience, of multi-modal wireless signals, e.g., audio, video, haptics and
and cognition in addressing these challenges. Furthermore, we
outline potential frameworks that leverage causal inference to olfactory signals; and 6) human-level cognition and reasoning
achieve the overarching objectives of future-generation networks, that must be integrated into the wireless system for intelli-
including intent management, dynamic adaptability, human-level gent decision-making, self-optimization, and efficient resource
cognition, reasoning, and the critical element of time sensitivity. management, ultimately enabling both sample efficiency and
We conclude by offering recommendations shaping the roadmap communication efficiency, thereby achieving more with less
toward causality-driven AI-native next-generation wireless net-
working. Our ultimate goal is to create scientific foundations data and resources.
for AI-native networks that inspire further unconventional and Despite significant progress in AI-native wireless networks,
forward-thinking research in causal reasoning within 6G and existing solutions (e.g., [2]) remain predominantly data-driven
beyond thereby charting a course towards next-generation AI- and based on statistical learning. These traits pose certain chal-
native wireless systems. lenges in overcoming the aforementioned challenges. Firstly,
statistical AI models require extensive data volumes, to ensure
I. I NTRODUCTION reliable performance. However, obtaining extensive datasets is
As we peer into the future of next-generation wireless not always feasible in wireless networks. Secondly, current
networks (e.g., 6G and beyond), the incorporation of artificial AI model parameters are closely linked to their training
data, requiring frequent and recurrent retraining to maintain
C. Thomas, and W. Saad are with the Wireless@VT, Bradley Department
of Electrical and Computer Engineering, Virginia Tech, Arlington, VA, USA, effectiveness as the wireless and radio access network (RAN)
Emails: christokt@vt.edu, walids@vt.edu. environments undergo rapid changes. This consumes substan-
C. Chaccour is with Ericsson, Inc., Plano, Texas, USA, Email: tial communication and computational resources. The resulting
christina.chaccour@ericsson.com.
M. Debbah is with Khalifa University of Science and Technology in Abu training overhead could disrupt the communication phase,
Dhabi and also with CentraleSupelec, University Paris-Saclay, 91192 Gif-sur- consequently impacting the stringent requirements of emerging
Yvette, France, Email: merouane.debbah@ku.ac.ae. wireless services, in terms of uninterrupted availability and
C. S. Hong is with the Department of Computer Science and Engineering,
School of Computing, Kyung Hee University, Yongin-si 17104, Republic of near-zero end-to-end latency. Furthermore, existing AI tools
Korea, Email: cshong@khu.ac.kr frequently encounter challenges when it comes to reasoning,
2

The Path Towards Causal-AI Driven Wireless Networks


Why is traditional How can Causality Intent
AI not enough? address it? management
Digital Resilience
Cognitive
World Training and data Reasoning and
dependent generalization networks

Causal Causal
Twinning & Cognition & Representation Discovery
Control Immersiveness
Trustworthiness, 6G
Huge carbon
6G emissions
Vanilla
AI
Causal
AI
transparency, and
explainability Causal
Goals

Inference

Opaque Sustainability and Dynamic


green networking adaptability Time criticality
Physical Virtual black-box models
World World
Fig. 1: Causal AI vision for next-generation wireless networks.

particularly when tasked with producing novel insights or energy-inefficient networking which contradicts the ongoing
combining already-seen experiences to generate fresh com- pursuit of energy-conscious sustainability objectives.
binations or make logical deductions. Finally, the reliance of Essentially, if we are to efficiently design AI-native wireless
statistical AI on complex neural networks (NNs) and often networks, we must create AI framework that excel in perfor-
opaque architectures makes it challenging to identify the root mance and in emulating human-like reasoning capabilities.
causes of any performance deviations or network operation “Human-like reasoning capabilities” refer to the ability of AI
issues at the granular level across the network stack, interfaces, systems to think, make decisions, and understand informa-
and the profusion of devices and sensors. tion analogously to how humans do, encompassing common
To overcome these limitations, as envisioned in Fig. 1, it sense, reasoning, contextual understanding, adaptability, as
is imperative to design new AI frameworks that: 1) enable well as ethical and explanatory capabilities. Such models
reasoning and generalization by exhibiting consistent perfor- could alleviate the challenges associated with conventional
mance across diverse wireless data distributions (inductive ML approaches by providing a deeper understanding of their
reasoning), as well as an ability to combine already seen decision-making processes and achieving superior generaliza-
experiences to generate novel logical conclusions (deductive tion across an array of tasks and scenarios as required by next-
reasoning), 2) possess the properties of explainability and generation AI-native wireless networks. Here, a promising
transparency which means they can operate reliably while pro- avenue lies in the realm of causal reasoning [6] and [7].
viding insights into how and why they make specific decisions, Causal reasoning involves recognizing cause-and-effect re-
thereby developing a deeper understanding of their actions lations within the wireless data, helping the AI models
across the wireless network protocol stack, and 3) promote understand how changes in one component of the system
the advancement of sustainable and green networks by im- can impact or predict outcomes in other components, and,
proving sample efficiency and creating more lightweight ML facilitating informed decision-making and effective problem-
models. Several methodologies can facilitate the development solving. Causality is realized through the processes of causal
of generalizable AI, such as transfer learning, meta-learning, discovery, representation, and inference. Causal discovery
domain adaptation, and continual learning [3]. However, these enables the creation of explainable AI-native wireless models,
approaches still require extensive and diverse datasets for allowing us to describe the network decisions that result in
efficacy and typically rely on NNs that lack transparency and specific outcomes unlike the opacity of transfer or meta or
interpretability. An alternative approach could involve collect- continual learning schemes. Meanwhile, causal inference and
ing a substantial amount of wireless data through extensive representation facilitate human-like reasoning by leveraging
sensing, similar to the data acquisition methods used for large concepts like interventions and counterfactuals, which can be
language models (LLMs) like ChatGPT. This approach, called performed either online or offline on the learned causal model.
data-driven networking with foundation models [4], can bolster In causal ML, “interventions” involve actively manipulating
AI models’ adaptability in wireless contexts. While such gen- one or more network variables to observe the effects on other
erative AI models are useful as writing assistants, their models variables. “Counterfactuals” refer to hypothetical scenarios
are known to generate spurious data points that impair the where we assess what would have happened if certain interven-
wireless network resilience and could yield faulty predictions. tions or changes had or had not occurred, providing insights
In fact, as shown in [5], ChatGPT today is seeing more into the causal impact of those interventions. Causal inference
errors than when initially introduced due to challenges like empowers wireless systems to extend the applicability of their
model collapse and catastrophic forgetting. Hence, deploying predictions and decision-making capabilities across various
ChatGPT-like AI models for wireless networks, especially in data distributions (generalizability), assuming a consistent data
mission-critical and time-critical settings, may be risky and generation process (causal understanding). This results in
brittle (e.g. autonomous vehicles or digital twins of industrial more resilient and efficient knowledge transfer across domains
plants). Additionally, gigantic NN architectures are compute- compared to transfer learning, enhanced sample efficiency
hungry and require significant processing, thus leading to (leading to sustainability) compared to meta-learning, and
3

dynamic adaptability to varying task or data distributions II. C AUSALITY PRIMER


without the risk of catastrophic forgetting associated with Our first key step is to fundamentally explain why and how
continual learning. As shown in Fig. 1, advances in designing wireless models can benefit from causal structure learning.
causal reasoning frameworks will allow the development of Towards this aim, this section provides a primer on causal-
AI-native wireless networks characterized by the previously ity, exploring its fundamental concepts and its relevance to
discussed attributes that include reasoning awareness, intent wireless systems.
management, resilience, dynamic adaptability, and real-time
synchronization which align with the objectives of next- A. Why do Statistical ML Models Fall Short in Reasoning over
generation wireless networks, namely 6G and beyond. Wireless Networks?
Causality provides insights into the actual cause-and-effect
dynamics within a system, akin to the “physics” of the system,
A. Contributions and hence goes beyond mere correlation or statistical depen-
The main contribution of this article is a novel and holis- dency. Consider a simple wireless beam training example for
tic vision that articulates fundamental principles necessary a terahertz (THz) system where pure statistical deep learning
to build next-generation causality-driven, AI-native wireless methods may fail. In this example, the system is provided with
networks. Our key contributions include: a dataset containing channel or network parameters (X) and
their corresponding labels (Y ) that represent a beam index.
• We first present the fundamental building blocks of causal
The model’s objective is to learn a mapping function, Y =
reasoning, and we motivate the need for incorporating
f (X), that accurately predicts the labels based on the observed
them in future wireless networks through a simple exam-
channel data. In this purely statistical approach, the model’s
ple.
learning process relies on identifying and capturing statistical
• We identify the inherent challenges in current AI-native
dependencies (in short, the joint distribution P (X, Y ) between
wireless systems, like lack of explainability, limited
input (X) and output (Y ) variables. To illustrate why a
reasoning capability, and energy inefficiency. We then
pure statistical approach is insufficient, we expand the simple
discuss the potential of causal reasoning for addressing
beam tracking example as illustrated in Fig. 2. The physics
these challenges.
behind the generation of the channel geometry encompassing
• We demonstrate how causal discovery and representa-
location parameters and scatter parameters (S) are represented
tion can expedite the advancement of interpretable and
by variables ue , us , and up , that are exogenous to the channel
cognition-enhanced wireless systems by deciphering the
model. The dependency of these exogenous variables on the
causal relationships within wireless data as a graph
channel geometry parameters via fe , fs , and fp , cannot be
and subsequently dissecting cause-and-effect dynamics.
accurately determined. As shown in Fig. 2, the propagation
The cause-and-effect dynamics are analyzed by causal
geometry parameters determine the angle of arrival (AoA) θi,l ,
graphical models (CGMs) to facilitate dynamic channel
delay τi,l , and path amplitudes αi,l , and their relationships
tracking in THz systems, precise digital twin (DT) mod-
are governed by either deterministic or random functions [8].
eling, training data augmentation, and the development
The channel vector at any subcarrier s and time t can be
of resilient wireless networks. We also showcase how
analytically represented as:
causal discovery and causal representation provide a
I L−1
more rigorous interpretation of the semantic information X X (i,l)

intrinsic to wireless data when compared to state-of-the- hs,t = fαi,l (xp , fc )bR (θi,l )e−j2πs∆f τt
i=0 l=0
(1)
art methods.
• We introduce a robust analytical framework based on = g(θ, xb , xu , xp , fc , α),
causal inference for addressing wireless control problems. where subscript (i, l) corresponds to the ith scatterer and
This framework leverages the learned CGMs and incor- lth path, with θi,l = fi,l θ
(xb , xu , S, xp ), and τi,l =
porates causal reinforcement learning (RL), causal multi- τ
fi,l (xb , xu , S, xp ). θ, xp , and α represent the vectors of AoA,
armed bandit (MAB), and causal multi-objective opti- position, and amplitude corresponding to distinct scatterers.
mization. We further outline how causality can address xb , and xu are the locations of base station (BS), and user
challenges related to partial observability, scalability, and equipment (UE), respectively. bR represents the array response
real-time communication protocol design in integrated vector. i = l = 0 captures the line-of-sight (LoS) path. Con-
sensing and communication (ISAC) systems. Leveraging ventional ML solutions approximate the non-linear function
interventional and counterfactual reasoning within causal g(θ, xb , xu , xp , fc , α), given the extracted channel parame-
ML, these frameworks aim to achieve the wireless goals ters from the training data. The BS employs received pilot
of dynamic adaptability, intent management, and time observations and UE feedback during the uplink (UL) as their
criticality. ML training data. In THz bands, increased penetration losses
In a nutshell, this paper will serve as an authoritative and reduced reflection coefficients (captured by S) heighten
guide and a meticulously crafted roadmap for understanding vulnerability to signal blockages, endangering transmission
the indispensable AI principles required to construct and reliability [9]. A sudden obstruction, such as a moving object
manage next-generation AI-native networks, equipping them or a person, disrupts the LoS path between the UE and BS,
with human-like reasoning capabilities. leads to channel gI (θ, xb , xu , xp , fc , α). This channel exhibits
4

Carrier freq., BW
Smooth surface
Rough surface
Channel
rface)
1 per su vector
paths (>
Diffuse Spe
cula
r pa
ths (1 p
LoS path
er su
rfac QoS
e)

BS RAN RAN
UE characteristics
Exogeneous Channel correspond to
variables Geometry-related parameters antenna geometry here

Fig. 2: Mapping a propagation environment to a causal diagram. For simplicity, the diagram shows the causal dependencies for a single
path, while the discussions in the paper is revolved around a more general case, that encompassed multiple scattering elements and multiple
channel paths.
substantial power fluctuations, far more pronounced than those data such as UL channel measurements, a causality-based
encountered in mmWave or sub-6 GHz bands. Classical ML framework integrates online data generated from various ex-
models could handle smaller variations in non-linear channel periments using the causal structure. These experiments, also
models, however, they cannot capture the immense dynamic called interventions, can be defined as picking the functions
variations typical of THz frequencies. This limitation arises fi , ∀i ∈ {e, s, p, θ, τ, α} from a set Fi of different environment
from their inherent curve-fitting nature, where the NN param- as well as RAN characteristics. Using interventions, system
eters are closely linked to the specific dataset used for their designers can understand how different factors within the
training. Furthermore, such black-box models cannot explain wireless environment, represented by X, influence the QoE
the rationale behind the selection of a specific beam index experienced by end users, represented by Y . Classical AI-
for a given set of channel parameters. One potential remedy native systems cannot achieve this, as they lack an understand-
to this issue involves training the ML algorithm using an ing of inherent causal relationships. The best they can do is
extensive dataset, with the assumption that, on average, system generate extensive set of random data distributions to train
performance remains consistent. Nonetheless, this approach the AI model, resulting in sample inefficiency. This example
has its drawbacks. Firstly, it demands substantial training data, illustrates the potential implications of causality in wireless
which may not always be readily available. Secondly, it leads systems. Next, we investigate how we can achieve the goals of
to ML models with a large number of parameters, resulting explainability, generalizability, and energy efficiency using the
in higher energy consumption as well as bulkier hardware. fundamental building blocks of causality: Causal discovery,
Hence, a fundamental question arises: how can we transition causal inference, and causal representation learning (CRL).
to a AI model that can adapt its performance to various
channel distributions, be resilient to unforeseen scenarios, and Definition 1. Causal discovery provides an algorithmic
provide interpretable results, unlike black-box models? One method to learn the inherent causal structure within observed
answer is through the use of causal reasoning [6]. In causal wireless data, encompassing channel information, network
ML, the system can obtain an understanding of the causal topology, or any multi-modal source data. In contrast to
relationships that lead to the generation of the channel as well statistical AI models, where the correlation between entities
as the relations among network variables that determine the (features) within the data is inferred, a causal model accurately
user quality-of-experience (QoE). These causal relationships portrays the relationships among them that have an explana-
can be captured using directed edges in a graph (see Fig. 2). tion (via directionality). Therefore, causal discovery offers an
For instance, h is an effect of parameters θ, τ , α, and W approach to design explainable AI-native wireless networks.
as well as RAN parameters (these cause variables are called One popular approach to represent a causal model through
parents of node h), and can be represented using a determin- a graph is to use the language of structural causal models
istic or random function. Understanding these dependencies (SCMs) [6] to describe the underlying mechanisms that drive
enables AI inference engines to manipulate them in real-time, a specific phenomenon of interest. SCMs, formally defined
thereby 1) enhancing the network’s predictive capabilities; and next, remain the most tractable way to perform causal logic,
2) ensuring better network decisions aligned with long-term thus, hereinafter, we will be giving our examples via SCMs.
goals, promoting seamless network connectivity. Next, we look
at how grasping these causal relationships confers advantages Definition 2. An SCM is a collection of elements <
over traditional deep learning-based wireless systems. U, V, F, P (U ) >, where V represents endogenous variables
B. How Can We Embed Causality Awareness into AI-Native (cause and effect variables), and U represents exogenous
Wireless Models? variables (random, unknown noise). The structural functions
fi ∈ F are designed so that fi (PAi , ui ) determines vi , where
In our example, the wireless environment is represented PAi ⊂ V and ui ⊂ U specify the sets of parents (in the
by X = {xb , xu , xp , S} that encompasses factors like the graph) and exogeneous inputs, respectively. The exogenous
physical object layout, which influences channel conditions, distribution P (ui ) determines the values of ui , and thus the
and users’ locations. Components of X are represented by the distribution of endogenous variables V.
possibly non-linear functions fe , fs , and fp . Here, to assess
the influence of X on Y , in addition to utilizing offline Causal discovery assumed variables vi to be random vari-
5

Layer (Symbolic) Typical Activity Typical Question Wireless Network Example AI Model
Supervised and Unsupervised Learning,
Seeing
What is? How would seeing X What does channel quality variational auto-encoders,
L1 : Associational p(y | x) (e.g., UL channel measurements,
change my belief in Y ? tells us about QoE at the user? recurrent neural network (RNN),
multi-modal sensors for DT) transformers, large lanugage models.
Doing What if I increase
(e.g., dynamic network slicing decisions, What if? the number of antennas, Causal RL,
L2 : Interventional p(y | do(x)) dynamic mobile-edge computing What if I do X ? would my BF quality be adjusted causal multi-armed bandit.
offloading decisions) for better user QoE?
Imagining
(e.g., altering network infrastructure, Why? What if I Was it the frequency band, Theory of mind based reasoning [10],
L3 : Counterfactual p(yx | x′ , y ′ ) transmission schemes for
had acted differently?
rather than the number of antennas, counterfactual explanation policies
recuperating from malicious firmware that had a detrimental effect on user QoE? using RL [11].
attack that corrupt the AI algorithms)

TABLE I: Different levels of reasoning using causal logic, with wireless examples.

ables connected by a causal graph. However, real-world to random functions that produce novel combinations
wireless observations are usually not structured into those of scatterers. This simulation of intervention can be
variables. For example, high-dimensional channel information performed online, without any human intervention. Inter-
or raw source data that is either video or images or holograms. ventions allow us to study the causal effects of directly
Herein, it is essential to exploit the concept of CRL. manipulating X on the subsequent variables in the causal
model, such as its effect on the outcome variable Y , while
Definition 3. CRL involves mapping large dimensional obser-
T ignoring the causal relationships from Z to X.
vations D = [d1 , · · · , dD ] into a lower dimensional represen-
• Counterfactuals: Counterfactual problems entail the pro-
tation involving causal variables v1 , · · · , vn , captured as:
cess of reasoning about the causes behind events, envi-
vi = fi (PAi , ui ), ∀i. (2) sioning the outcomes of different actions in hindsight, and
determining which actions could have led to the desired
We now turn our attention to leveraging the causal model for
results. In the previous example of the channel’s impact
making informed decisions and drawing meaningful inferences
on QoS, a potential counterfactual question could be:
through the concept of causal inference.
“In a THz system, would the network’s QoS have been
Definition 4. Causal inference seeks to leverage the knowl- more resilient to blockage if a reconfigurable intelligent
edge of the causal structure obtained from the causal discovery surface (RIS) were placed near the UE/BS, to alter
phase and the learned representations from CRL to make pre- the propagation environment?” Analytically, this example
dictions and answer questions about the effects of interventions counterfactual query can be formulated as optimizing the
or counterfactuals over variables of interest. The utilization of RIS phase shifters:
causal inference thus empowers wireless systems with both
inductive and deductive reasoning capabilities. [ϕ∗ ] = arg max E [Y | xo , yo ] ,
ϕ
(4)
2
Causal inference can be effectively implemented using two s.t. ∥E [Y | xo , yo ] − Yg ∥ ≤ ϵ,
key concepts: where xo is the set of observed channel parameters, yo is
• do-calculus: The do-operator is a mathematical tool used the observed QoS (e.g., throughput or delay or reliability),
to represent a physical intervention or manipulation of and ϕ is the vector of RIS phase shifts. Yg is the target
a variable in a causal model [6]. In the context of a quality-of-service (QoS) value and ϵ is an arbitrary small
causal model represented as Z → X → Y , applying the quantity. Hence, using counterfactuals, AI models can
do-operator to X involves removing all incoming edges go beyond just observing the impact of an intervened
(that represent the generation of a particular variable environment. They can generalize their model behavior to
from its descendants) to X and setting to a specific hypothetical scenarios where fundamental changes to the
value, x0 . This can be represented as do(X) = x0 . network setup, such as introducing new technologies or
For example, in Fig. 2, using a do-operator, the system altering configurations, could improve performance under
designer can analyze the impact of different environments various conditions. In traditional wireless problems, the
by considering do(X) as implementing different propaga- objective is to find the optimal solution for a predefined
tion environments, e.g., Rayleigh, Rician, or purely LoS. objective function under specific constraints. In contrast,
Alternatively, the do−operator in this scenario can mimic counterfactuals empower the AI system to adjust these
the dynamics of user/scatterer locations as: objectives and constraints based on higher-level require-
ments, offering a more flexible and adaptive approach to
xt = f (i) (xt−1 ) + nt , i ∈ I (3) problem-solving. Hence, the counterfactual formulation
where f (i) (xt−1 ) can be a non-linear function and nt an in (4) outperforms conventional optimization approaches
additive noise component. I constitutes the set of inter- in wireless problems.
ventions, encompassing a diverse range of functions that In Table. I, we provide several wireless examples that can
portray different forms of user or scatterer movements. capture interventional and counterfactual queries. In compar-
These variations could span scenarios such as high-speed ison to traditional deep learning or model-based optimiza-
train travel or particle displacements within environments tion, a causal approach with counterfactuals and interventions
experiencing strong winds. Alternatively, f (i) could refer has major advantages. First, during training, causality allows
6

training the AI model on a diverse set of interventional and their capacity to autonomously restore the system to the
counterfactual queries as captured by (3) and (4), in addition designated target QoE. This demands periodic retraining to
to statistical observations. This approach fosters distribution sustain uninterrupted connectivity and ensure the expected
invariance, ensuring robustness of the causal ML algorithm to QoE. This can be exemplified using the THz blockage example
all interventional and counterfactual distributions. The second discussed in Section. II-A.
advantage involves the practical implications of understand- 2) Causal modeling for explainable wireless networks:
ing causal relationships within wireless networks for short- To address the aforementioned challenge, we offer insights
term and long-term planning. Certain network decisions, such into the interpretability of causal models. For example, as
as power allocation (for instance, transitioning from equal shown in Fig. 3, in dynamic network slicing problems, variable
power allocation to an interference-aware water-filling algo- Y plays the role of a QoE metric, which provides insights
rithm), computational resource optimization (assigning down- into user satisfaction and network performance based on
link RAN computations to either renewable or non-renewable resource allocation. Variables ci represent control parameters
energy nodes for sustainable wireless), or scheduling algorithm like power or resource block allocation, signaling waveforms
changes, can potentially be executed in real-time with mini- or computational resources for individual network slices. Each
mal impact on QoE performance. This is achievable because slice may have have its own varying bandwidth, latency, and
causal AI algorithms can compute minimal resource allocation reliability requirements that must be met by the resource
or signal strategy changes by optimizing long-term average management framework. The intermediate variables could
rewards (e.g., QoE, given the network’s current state). The encompass network metrics such as SNR, channel quality
resulting causal inference process also incorporates previous metrics, or average queue length at the transmitter. Some
experiences and can be formulated as a counterfactual query, of these metrics may directly impact the final QoE metric,
as demonstrated in (4). Conversely, long-term planning may while others, like variables X0 , may represent miscellaneous
involve addressing issues like blocked wireless links to specific control information irrelevant to the QoE. The objective is to
UEs, which may require the introduction of RIS on buildings compute the set of intervention variables Cs ∈ C (defining
as a potential solution. These longer-term initiatives may all causal variables in the graph) and their corresponding
necessitate human intervention. intervention levels cs to optimize the expected target outcome
Next, we delve into the limitations and challenges faced by Y . Unlike traditional ML-driven wireless solutions, leveraging
current AI-native wireless networks. a causal graph empowers the AI algorithm to calculate optimal
interventions, such as resource allocation variables, within
III. C LASSICAL VS . C AUSAL R EASONING - BASED a broader set C. This set may contain alterations to the
AI- NATIVE W IRELESS N ETWORKS generation of network variables compared to just utilizing
This section outlines the prominent hurdles prevalent in observational data. Consequently, the outcome will be resource
contemporary AI-powered wireless systems, and the potential allocation algorithms that maintain invariance to these inter-
of causality to overcome them. ventions as long as the causal relationships remain intact.
Formally, the goal is to find:
A. Lack of Explainability and Trustworthiness
[C∗s , c∗s ] = arg min EP (Y |do(Cs =cs )) [Y ] . (5)
1) Why are existing AI models not interpretable?: In Cs ∈C,cs ∈As
model-based Bayesian systems, the mapping from the data A pivotal question emerges at this juncture: What renders
X to Y = f (X) is achieved by assuming a specific model, the framework above interpretable? This can be answered
Y = A(θ)X + N , where θ represents the set of estimated using the framework in [11]. Herein, the explanation of
parameters. This is a common model for several wireless any occurrence, referred to as the explanandum Y (formally
communication problems including channel estimation, data representing “why did Y occur?”), is accomplished through
detection, and resource allocation [8]. In Bayesian systems, a sequence of causal relationships operating at various levels,
explainability is evident since the model provides clear in- along with the intervened values of the causal variables, called
sights into which set of parameters contributes to a particular the explanans. The level of detail in the explanation depends
outcome. For instance, in a single-user downlink scenario, on the intended audience, which can vary from an end-
it is apparent from the receiver signal-to-noise ratio (SNR) user subscribing to the network service to a modem system
expression (Y ) why maximizing it is achieved through maxi- engineer with in-depth knowledge of BS infrastructure and
mum ratio transmission (X). However, when dealing with non- signal processing. Explanations can then be tailored to either
linear underlying signal models that are difficult to accurately a micro-level, encompassing the entire sequence of detailed
capture, deep learning systems often outperform Bayesian relations and processes, or a macro-level, which provides
model-based systems. Despite their superior performance, a high-level, natural language description of the process.
deep learning models face key challenges such as sample bias Moreover, these explanations can be communicated offline
and overfitting. By using black-box models, AI algorithms or used autonomously online to provide various resource
cannot comprehend the root causes behind deviations from allocation and signaling scheme proposals to meet the ex-
particular performance objectives. As a result, these algorithms pected business goals in intent-based wireless networks. An
cannot discern the minimal configuration changes needed in example of an online approach to leverage these explanations
terms of resource allocation or control signaling approaches to is to implement (causal) RL with human feedback, akin to
tackle performance issues effectively. This limitation impacts [5], but with explanations acting as the feedback component.
7

Resource Physically, M may represent various wireless propagation


QoE environments or wireless control tasks. The model class M
allocation Explained by different levels
of inherent causal relations encompasses various functions fi ∈ F (see the SCM definition
in Section. II) that collectively form a specific observational
distribution across various network variables P M . Let I be a
Computing set of interventions on the SCM, which is representative of ex-
resources ternal manipulations that bring about changes in environment
Signaling or RAN intent distributions. For the set of model classes M,
waveforms . . we define distribution generalization as follows.
.
.
. Explanandum Definition 5. Pair (P M , M) is considered to admit distribu-
Communication .
tion generalization to I if, for any given ϵ > 0, there exists
resources a function fϵ∗ ∈ F such that, for all models Mf ∈ M where
Explanan
PM f = P M , the following condition holds:
at level 1
Explanan h i h i
2 2
at level 2 sup (Y − fϵ∗ (X)) − inf sup (Y − f⋄ (X)) ≤ ϵ.
i∈I f⋄ ∈F i∈I
Fig. 3: Structural causal model representing inherent causal relations (6)
among network variables. Distribution generalization in (6) does not require the exis-
Problem (5) that involves the intervention set and their values tence of a minimax solution in F. Rather, one must only find
can be effectively solved using causal Bayesian optimization an approximate solution fϵ∗ (X) that is close to the minimax
(CBO) [12]. CBO offers computational feasibility compared solution for all interventions on the wireless environment.
to traditional approaches like Markov chain Monte-Carlo Apart from distribution generalization, domain generalization
(MCMC). It operates akin to Bayesian inference but explicitly can also be achieved through training across various inter-
considers the optimization problem’s causal structure. While ventional and counterfactual distributions, depending on how
this example primarily focuses on the causal relationships domains are defined. In some wireless contexts, domains
between resource allocation variables and QoE, we stress that can be characterized by differing contextual variables (with
the CBO framework can be used for explainability in a broad causal relations invariant), treatable as exogenous variables
range of wireless scenarios. and can be manipulated through interventions. In contrast,
the fundamental data domain varies in other cases (medical
B. Inability to Reason and Generalize application vs. live sports event), making it challenging to
Although explainability can be facilitated by the description address solely through interventions or counterfactuals. In
of causal sequences underlying specific network behaviors, contrast to conventional ML approaches that require exten-
achieving a level of generalizability across multiple data sive training data from various random data distributions,
distributions and domains (inductive reasoning) necessitates causal ML leverages a single causal graph encompassing
a distinct examination. We next discuss why traditional AI multiple data distributions generated using intervened SCMs,
falls short here, and how causality can be leveraged for and accomplishes this efficiently with fewer samples. Hence,
generalizability. this approach ensures the ultra-low latency, time criticality,
1) Limited reasoning capacity of artificial NNs (ANNs): and high accuracy synchronization needs of cyber-physical
Generalization involves extending knowledge and capabili- systems in 6G, while fostering seamless interactions between
ties across various domains, such as wireless environments digital and physical realms. The training efficiency of causal
(channels), network intents (energy consumption and end-to- ML leads to energy-efficient algorithms, in contrast to other
end reliability), and control tasks (resource allocation and link approaches to generalization such as transfer learning, meta-
adaptation). Current developments in AI-native 5G air inter- learning, and continual learning, as detailed next.
faces as per 3GPP release 17 [1], primarily involves converting
model-based signal processing approaches for beam manage- C. Energy Efficiency
ment, channel estimation, and positioning into individual AI 1) Energy inefficient AI-native wireless networks: Tradi-
components. Most of these existing ML solutions for wireless tional AI approaches, as mentioned previously, can create
networking [2] rely on ANNs, like transformers, convolutional generalizable solutions through extensive training. However,
neural networks (CNNs), or RNNs. However, ANNs lack the resultant AI models (foundation models being an example)
distribution invariance. Hence, an ANN approach consumes for wireless networks yield NNs with a large number of
significant communication and computational resources due parameters. For e.g., LLMs are being hailed as a game-changer
to the training overhead which would impede real-time im- for wireless networks. Particularly, they are being touted as
mersive experiences, a crucial aspect of 6G. solutions to the development of self-governing networks with
2) Generalizability via causal reasoning: Causal reason- intelligent decision-making at the edge [13]. In intent-based
ing can allow overcoming the above challenges of learn- networks, LLMs can be utilized as proposal and evaluator
ing in wireless networks. Incorporating reasoning into AI- agents since they can perform self-criticism and self-reflection
based wireless systems necessitates a level of generalizability, over past network actions, learn from mistakes and better
as defined next: Consider a fixed model class M ∈ M. align the actions with the system’s intent. However, despite
8

causally influenced by multiple factors such as the frequency


Frequency
Band
band, channel, and the capabilities of the UE, including ML
algorithms and processors. To isolate and analyze the specific
impact of a causal variable, such as the frequency band, it is
Network common to control or fix other causal dependencies (by do-
architecture Channel
(ORAN-BS,
Environment
User QoE calculus), like the channel environment and UE capabilities,
RISs) at predetermined or set values. By doing so, one can observe
and evaluate the direct causal effect of the frequency band on
the user’s QoE, without the confounding influence of other
factors. This approach helps in understanding the implications
UE
capabilities of individual causal variables on the outcome of interest and
allows for focused analysis and optimization of specific aspects
Fig. 4: Simple example of a structural causal model. in the system. Several pertinent problems in this domain
warrant attention, and they are discussed next.
their promising potential, LLMs include billions of parameters A. Causal Graphical Models (CGMs) for Near-accurate Dy-
which makes them energy-inefficient (during both training and namic Wireless Environment Modeling
inference) and unsustainable.
We now discuss some open problems in wireless networks
2) Causality as a tool for energy efficiency and sustain-
that can benefit from causal graphical modeling to achieve
ability: Conventional methods aimed at enhancing energy
generalizable AI-native systems.
efficiency, such as NN pruning, can result in a reduction
1) Ultra-reliable beamforming for THz wireless: In THz
in performance. This is particularly undesirable for wireless
wireless systems, ultra-massive multiple-input multiple-output
networks given their stringent performance requirements high-
(MIMO) enables LoS beamforming with multi-user inter-
lighted in Section I. Here, a more promising avenue is to
ference cancellation under perfect channel state information
leverage the capabilities of causal ML, to build lightweight
(CSI). However, dynamic channel conditions due to user mo-
foundation models. Next, we delve into the factors that could
bility or environmental changes require real-time adaptive al-
foster sustainability in wireless networking through causal ML.
gorithms. These algorithms fall into two categories [2, Fig. 5]:
1) Lighter models: Causal models excel at simplifying com- Bayesian statistics-based methods (e.g., Kalman filter (KF))
plex relationships by focusing on essential causal factors. and data-intensive ML methods (e.g., MAB or RL). An accu-
This simplification results in models that require less rate user mobility model is essential to effectively use model-
parameters, computations, and memory. By being lighter, based algorithms like the KF. Additionally, with an increasing
these models consume less energy during both training number of antennas, the channel estimation process becomes
and inference. more intricate, increasing the processing time. Consequently,
2) Reduced training rounds and recurrent retraining: Causal a temporal gap emerges between the pilot transmission phase,
models efficiently capture causal relationships among the dedicated to channel estimation, and the practical applica-
wireless data that remain stable across different contexts. tion of the estimated channel for downlink beamforming.
This results in less training rounds while also avoiding During this interim period, the channel’s characteristics may
recurrent training. The reduced training significantly de- undergo substantial changes, exerting a notable influence on
creases the computational burden associated with large- the precision of downlink beamforming. ML models in this
scale models, contributing to overall computational effi- context encounter challenges, either from extensive retraining
ciency. needs or due to longer inference times with larger, data-
3) Reduced transmission: By communicating just the causal intensive models. To overcome those, a promising solution
states present in the data, one can reduce the amount is to model the beam forming problem as a function of the
of information transmitted, e.g., through causality-based causal dynamics (how the underlying CGM evolves over time)
semantic communication (SC) [14], thereby decreasing of the channel components. This involves modeling the joint
communication energy consumption. distribution of beamforming vector vt , and the channel ht , i.e.,
Next, we turn our attention to practically realizing the
R
p(vt , ht | zt−1 ) = p(ht | zt ) p(zt | zt−1 ) p(vt | zt )dzt
above benefits using the distinct stages of causality outlined zt | {z } | {z }
latent representation causal dynamics
in Section II. using variational causal networks [15]. As demonstrated in
IV. C AUSAL D ISCOVERY AND R EPRESENTATION [15], the proposed causality-aware beamforming solution al-
L EARNING FOR N EXT-G ENERATION W IRELESS N ETWORKS lows dynamic adaptability and outperforms existing beam-
tracking solutions that struggle to model time-varying channels
As explained in Section II, causal discovery involves learn-
accurately. Moreover, the resulting network spectral efficiency
ing the underlying causal graph representing the relations be-
brings the system closer to a perfect CSI-based case. To model
tween different features present in the data (defined as causal
the causal dynamics, a promising approach is to use Granger
variables) that can directly or indirectly affect the behavior of
causality [16]:
a wireless system. However, it is crucial to identify the relevant
causal variables before learning the causal graph. For example, Definition 6. Non-linear Granger causality: Given N sta-
consider the causal graph of Fig. 4. Here, the user’s QoE is tionary time-series z = [z1 , · · · , zN ] across time-steps t =
9

{1, · · · , T } and a non-linear auto-regressive function gj , the These models must also be resilient against unexpected occur-
time evolution of zj will be: rences like hardware failures, cyber-attacks, or environmental
shifts. Existing schemes include using score-based models
zjt+1 = gj (z ≤t , G) + wjt+1 . (7)
and generative adversarial networkss (GANs) to learn the
In this setup, time-series i Granger causes j, if gj depends posterior distribution of either the wireless channels or user
on zi≤t . This dependency is encoded in a causal directed locations [19], and using noisy uplink pilot signals as the
acyclic graph (DAG) G. For a wireless channel, each zi here observations. However, these generative modeling schemes
may represent the causal variables identified from the uplink cannot fully exploit the underlying structure present in the
channel measurements sufficient to describe the channel. data, particularly for high-frequency wireless systems, where
Granger causality provides two key advantages. First, it the channel data exhibits high levels of structure (see (1)). To
allows a generalization of the state space model to non- overcome these challenges, CGMs can be used to understand
linear models, as represented by the NN model gj . Second, the intrinsic causal structure present in wireless channel data.
training such causal reasoning enabled systems requires fewer Moreover, utilizing the learned causal graph G, makes it easier
samples for better generalization compared to conventional to create novel combinations of training data by interventions
deep learning. This answers a key question: can causal ML and counterfactuals as discussed in Section. II. The resulting
tools be leveraged for sample-efficient channel estimation in generative AI algorithms are called causal generative models.
time-varying systems? Sample efficiency can be achieved since Exploiting training data augmentation through CGM extends
enforcing causal constraints on the channel dynamics results beyond this example. The open problems in this context
in a reduced dimensional state space over which channel encompass 1) generating diverse network scenarios, including
estimation happens. those arising from natural disasters, congested network traffic,
2) Causal discovery for accurate physical twin modeling software malfunctions, and hardware imperfections, 2) creat-
in digital twins: The existing literature on physical twin ing data that accurately reflects the complexity of large-scale
modeling [17] predominantly relies on Bayesian inference wireless network deployments, considering interactions among
methods. However, a limitation of this approach is its de- numerous devices and users, particularly when employing DT
pendence on the specific data distribution used to construct models, and 3) developing methods for generating a diverse set
the DT, which can restrict its applicability to diverse wire- of tasks suitable for applications of CI over wireless networks.
less environments. In contrast to this, causal discovery can
be employed to model DTs of the entire physical network
B. Resilient Wireless Networks
environment (p(st+1 | st , at )), including RAN architecture,
edge servers, cloud services, and cell geography. The learned Resilience encapsulates a wireless system’s ability to sustain
models of the environment dynamics, are sometimes called functionality in the presence of errors, adapt to erroneous
as “network state” models [18]. Here, vector st represents a influences, and promptly recuperate its normal functionality.
latent representation generated from the observations received Broadly, the comprehensive notion of resilience encompasses
at the DT, which may include data from various sources e.g., various system attributes, namely detection, remediation, and
cameras, LiDAR and satellite images. A network DT located recovery [20]. Only few studies, e.g., [20], and [21] attempted
at a cloud would replicate various latent network features (st ), to define resilience for wireless networks. These state-of-the-
including signals, coverage, interference, traffic behavior, and art methods rely on model-based non-convex optimization
user mobility, across different open systems interconnection algorithms to build resilient communication systems. However,
(OSI) layers. Vector at is a versatile representation, encap- model-based algorithms are not effective when the dynamics
sulating crucial network actions, including resource allocation of the wireless or RAN environment diverge from the assump-
decisions or autonomous tasks relevant to the physical system tions inherent in the model, as explained in Section II. In
modeled by the DT. However, learning such models purely contrast, we propose a framework founded on causality prin-
from observations will be biased due to the presence of ciples aimed at revolutionizing the design of resilient wireless
confounding variables ut , which may be unknown to the BS systems. Consider that as outlined in Fig.3, BS powered by a
or the cloud hosting the DT. DT is aware of the causal graph G that represents the intricate
Contrary to the literature, CGMs offers a better understand- causal relations among network variables. Using G, the BS
ing of causal relationships among network variables, enhanc- can continuously monitor future QoE values while the wireless
ing model precision (explainable models, as discussed in Sec- network is in a particular state st . While monitoring, the BS
tion. III-A2) and network traffic prediction. This improves net- can detect prospective QoE violations in the near future. This
work optimization, resource allocation, and decision-making, can be formulated as computing the average QoE over a time
boosting wireless network performance and efficiency. Real- duration T , given that the BS follows given resource allocation
time reconstruction and prediction via DT models are crucial policy πt (at | st ):
for connected intelligence (CI) applications, offering sample
efficiency and reduced training efforts for real-time network Ȳ = Ep(Y |st ,do(at →at+T )) [Y ] ,
(8)
control and tasks across diverse applications. s.t. at ∈ A.
3) CGMs for wireless training data generation: A scarcity
of training data can adversely impact the performance of AI- If Ȳ is below a specific threshold, the network proceeds to the
native network models that make real-time control decisions. remediation phase, i.e., it solves the following counterfactual
10

optimization: s2 is the same. In other words, from an SC perspective,


arg max EY ∼D [Y ] , CRL enables a unique representation for states that share
π(at |st ) (9) comparable cause-and-effect repertoires. This leads to a re-
s.t. at ∈ A, duction in the amount of information conveyed compared to
where A represents the space of feasible resource alloca- systems that do not employ CRL. Next, we briefly discuss
tionSvariables capturing the system constraints. Here, D = how to define the semantic information present in the derived
Do DI , where Do represents the observational dataset cor- causal representation, a critical metric for formulating the SC
responding to the state transitions from time t to t + T , given problems.
the current policy πt (at | st ), and DI represents the interven- 2) Characterizing semantic information using integrated
tional dataset. Here, various interventional distributions can be information theory (IIT): In classical information theory,
generated using causal generative modeling. Hence, the above mutual information is symmetric, meaning that, for any two
detailed causal inference steps help to achieve the resilience random variables X and Y , I(X; Y ) = I(Y ; X), and it
goal in next-generation wireless networks. measures the information conveyed by the association between
Thus far, we examined how various signal processing func- two random variables. However, in the context of causality-
tions (joint computing, and control aspects) throughout the based SC systems, the association between two variables is
OSI layers can benefit from causal discovery. Next, we discuss directed, emphasizing the causal relationships rather than mere
how, causal discovery can contribute to transmission reduction information exchange. Hence, classical information theory
(communication aspect). cannot completely express the semantic information conveyed
by a causal system. A promising alternative here is the IIT
C. Causal Discovery and Representation Learning for Seman-
[18]. In IIT, intrinsic information present in st is nonzero
tic Communications
only if it has selective causes and selective effects within
SC systems could significantly enhance network perfor- the system. Inspired by this, integrated information for an
mance across multiple dimensions, including minimizing bits SCM is intrinsic information that is generated by the whole
and improving latency, reliability, and resilience. However, mechanism (state + its cause and effect mechanisms) beyond
existing SC approaches (see references in [14]) often lack the information generated by its parts. This means that the
a rigorous formulation of data semantics and rely on data- mechanism is irreducible with respect to information, thus
driven AI models for their semantic system design. Here, the clearly defining what represent or does not represent semantics
question to be addressed is: How can AI extract representations (causal mechanisms with non-zero integrated information).
from a multi-model source datastream and control plane data, Building on the aforementioned concepts of semantic repre-
enabling contextual communication for minimal data transfer, sentation and semantic information rooted in causality, there
generalizability across tasks and data, and high fidelity at are at least two key open problems:
the receiver? A promising candidate herein is to define the • Formulating robust transmission schemes for multi-user
semantics at the transmit side as the inherent causal structure systems, where each user may utilize distinct semantic
within the data extracted using causal discovery [14], and [22]. languages constructed from CRL and exhibit varying QoE
To learn the causal graph among source data features, one requirements.
can use techniques like generative flow networks [23]. The • Discovering causal relationships among the control in-
causal graph can be learned as the posterior distribution of the formation distributed across multiple network layers and
entities under different interventions in the task distribution (in designing scheduling algorithms that use the semantic
short, a multi-modal distribution compared to variational tech- representations obtained from source data and the control
niques [24]). After learning the causal graph, it is critical to information.
understand how to embed the entities and relations in a multi-
These challenges can be effectively addressed through a fusion
dimensional vector space that respects the causal structure and
of causality and hypergame theory [25] or by incorporating
is minimalist. The answer to this lies in the CRL concept
theory of mind reasoning [10].
discussed next. The resulting semantic representation can be
processed and communicated across the wireless channel.
1) CRL for SC systems: SCM facilitates the representation D. Causal Discovery for Integrated Sensing and Communica-
of causal relationships tion (ISAC)
Q in a disentangled form, expressed as
P (s1 , · · · , sn ) = P (si | P A(si )). We next define the The ISAC paradigm synergistically combines sensing and
i
concept of CRL analytically, which describes how to extract communication functions, potentially sharing common spec-
the representations from high-dimensional observations. tral, signaling, and hardware resources. ISAC can have two
possible advantages: integration gain, achieved by enhancing
Definition 7. CRL is the D−dimensional embedding of the resource efficiency, and coordination gain, which involves co-
states, ϕi : s → RD . ϕi is causal-invariant if ∀sc , se , s1 , s2 ∈ design efforts to balance network performance (user QoE)
S, a ∈ A, ϕi (s1 ) = ϕi (s2 ) if and only if the cause and effect or attain sensing benefits (higher sensing resolution). Some
set are same for both s1 , s2 while satisfying P (s1 | sc , a) = applications of ISAC include 1) multipath parameter esti-
P (s2 | sc , a), P (se | s1 , a) = P (se | s2 , a). mation from nonlinear channel models (see Section. II) for
From Definition 7, we conclude the following: if ϕi (s1 ) = simultaneous localization and mapping (SLAM) of the en-
ϕi (s2 ), then the semantic information conveyed by s1 and vironment, 2) integration of sensing (using, e.g., LiDAR,
11

radar, cameras) with communication to enable vehicular com- In beam tracking, causal generative modeling can be employed
munication for safer and more efficient transportation, and to make precise predictions regarding the future positions of
3) advanced immersive experiences within a metaverse by users or scatterers. Beam tracking can be further improved by
sensing the user’s physical location, movements, and gestures using nonlinear minimum mean squared error (MMSE) meth-
and communicating this information to enhance interactions ods that considers the predictions as prior information. We
with the metaverse environment. ISAC applications require propose to consider nonlinear MMSE because of the nonlinear
real-time monitoring, synchronization, and the processing of nature of the signal models (beamforming vector as a function
large amounts of distributed sensing data across multiple of user and scatterer positions, which are the observations from
devices to leverage the benefits of integration and coordination sensors). The nonlinear MMSE method involves computing
gains. This leads to challenges that existing AI tools cannot posterior distributions of beams by incorporating the causal
handle, as explained next. dynamics described in Section IV-A1.
1) Why is data-driven AI insufficient for achieving the Efficient communication-aided distributed sensing can be
integration and cooordination gains of ISAC? achieved through the utilization of causal reasoning games,
[26]. Here, each sensing node can learn the causal graph, rep-
• Dynamic target tracking: Accurately tracking objects in resented as an SCM, that captures the causal relations among
the wireless environment, especially for SLAM, becomes different network decision variables (power, spectrum, beam-
a challenging task due to the mobility of these ob- forming vectors, ISAC waveforms). By employing counter-
jects. Additionally, dynamic beam alignment and tracking factual reasoning, each sensing node can estimate other users’
present further challenges, particularly when dealing with communication strategies and data distributions, Additionally,
high mobility of scatterers or users. Due to the reasons each sensing node can optimize communication and sensing
discussed in Section IV-A1, data-driven AI cannot accu- strategies by leveraging game-theoretic techniques and the
rately capture the time-varying dynamics of the underly- estimated beliefs about other nodes. Hence, causal discovery
ing nonlinear signal models. Consequently, this limitation using SCMs and game-theoretic tools allows sensing nodes
contributes to beam misalignment, thereby jeopardizing to make informed decisions about what to communicate. This
the QoE of the users. dynamic process enables also the optimization of spectrum and
• Resource efficiency in distributed sensing: Leveraging time resources (how to communicate). Consequently, causal
joint transmissions from multiple sensing nodes (multiple discovery plays a pivotal role in improving resource efficiency
viewpoints) can enhance sensing capabilities. Likewise, within distributed sensing scenarios, enabling the fulfillment of
jointly processing communication signals received by the requirements for dynamic adaptability and real-time syn-
distributed receivers can improve sensing performance. chronization essential for next-generation wireless networks.
However, designing joint sensing solutions for distributed Next, we answer the following crucial question: how can
systems is challenging, requiring consideration of fac- we effectively leverage the learned causal graph to make
tors such as synchronization, control overhead, as well informed control decisions and draw meaningful inferences for
as complex design issues related to joint sensing pro- multiple tasks in real-time monitoring of autonomous wireless
cessing functions. Moreover, resource efficiency, through networks? The key to addressing this lies in causal inference.
the integration gain becomes challenging due to the
substantial volume of sensing information that must be V. C AUSAL I NFERENCE FOR N EXT-G ENERATION
communicated, particularly for scenarios like the meta- W IRELESS N ETWORKS
verse. Advanced distributed learning models and tech-
Causal inference facilitates the execution of interventional
niques, including transfer learning, federated learning,
and counterfactual reasoning across various wireless tasks. We
and graph neural networks, could potentially support
identify specific wireless challenges where causal inference
the use of distributed sensing devices and facilitate the
can contribute to achieving future wireless networking goals
efficient integration of information collected from various
such as time criticality, intent management, and dynamic
locations. These models and techniques can also aid
adaptability, as illustrated in Fig. 1.
in compressing distributed sensing data and reducing
coordination overhead. However, the recurrent retraining A. Causal Inference for Wireless Control
required by such data-driven AI models cannot help meet The radio resource management problem is typically non-
the real-time synchronization demands. Furthermore, the convex and becomes computationally complex, particularly
associated AI models lack scalability as the network size for large networks. Moreover, in dynamic wireless networks,
fluctuates dynamically. This is attributed to changes in the model-based signal processing algorithms may be unable to
communication context as the number of sensing nodes maintain consistent performance across various scenarios. In
varies, which can be viewed as shifts in the data domain. this context, RL has been a popular approach for wireless
2) How can causal discovery enhance transmission effi- control problems. However, it is a passive approach that lacks
ciency, and optimize real-time ISAC systems? The challenges reasoning capabilities and relies solely on observational data,
associated with dynamic target tracking and beam alignment typically presented as a set of state-action-reward tuples. We
can be efficiently addressed by applying the Granger causality next describe the key limitations of RL algorithms such as Q-
concept, as explained in Section IV-A1, to construct a nearly learning, SARSA (state-action-reward-state-action), and deep
accurate model of the time-varying target object or UE device. Q-Networks (DQNs).
12

1) Why is RL using observational data alone insuffi- task or a wireless environment. For creating an NN model
cient for addressing wireless control problems? Wireless that can generalize across various wireless environments or
resource allocation problems are often posed as partially- tasks, the accumulated learning data obtained through causal
observable Markov decision processes (POMDPs) M = queries can be distilled into an NN using “behavioral cloning”
(S, O, A, pi , po , pt , r) with hidden states s ∈ S, observations [28]. The learning history, which encompasses the causal state
o ∈ O, actions a ∈ A, initial state distribution pi (s0 ), state transitions, can be defined for a specific task or wireless
transition distribution pt (st+1 | st , at ), observation distribu- environment n as:
tion po (ot | st ), and reward function r : O → R. The history (n)
of state-action pair at time t is, ht = (s0 , a0 , · · · , st ). Model- ht = (o0 , a0 , r0 , o1 , · · · , oT , aT , rT , oT +1 )n . (10)
based RL relies on the estimation of the POMDP transition Assume that the lifetimes of the agents, called learning his-
model ps (st+1 | ht , at ), decomposed into two sub-problems: tories, are generated by the source RL algorithm for a set
1) learning: given a dataset D, estimate a transition model of individual tasks or wireless environments {Mn }Nn=1 . This
qb(st+1 | ht , at ) ≈ ps (st+1 | ht , at ), 2) planning: given a (n
process creates a dataset D = {(ht ) ∼ PMn }. Next, by
history ht and a transition model qb, choose an optimal action
algorithm distillation (AD) [29], we distill the behavior of
at . For a network, states may represent the local monitoring
the source algorithm into a sequence model that maps long
information available to slice i associated with BS b. This may
histories to probabilities over actions. AD includes training
include the SNR measure, traffic arrival rate for slice i, and
NN models Pθ with parameters θ using a negative log
the available bandwidth. We can define the actions for slice
likelihood (NLL) loss. The NN parameters can be obtained
i as the physical resource block (PRB) allocation size (e.g.,
by minimizing the following loss function:
bandwidth). To ensure optimal PRB allocation, it is crucial to
achieve a balance between meeting transmission latency re- T
N X
(n) (n−1) (n)
X
quirements and fulfilling traffic demands. This involves avoid- L(θ) = log Pθ (A = at | ht , ot ). (11)
ing both under-provisioning, where insufficient resources are n=1 t=1

allocated, and over-provisioning, where excessive resources In scenarios involving multiple wireless tasks, this approach
are allocated. By concurrently satisfying transmission latency guarantees that the learned policy remains task-agnostic. When
and traffic requirements, the PRB allocation can effectively uti- the source algorithm is trained across various wireless envi-
lize resources while maintaining the desired QoE and avoiding ronments and subsequently distilled into a NN, it results in the
unnecessary resource wastage or performance degradation. development of a NN that is invariant to environment changes.
Traditional RL methods for addressing the above problem In summary, training across a wide range of environments
are limited by their reliance on specific observation and state or tasks and then distilling this knowledge into a sequence
transition distributions, defined as po and pt , respectively. model leads to the development of causal foundation models.
Furthermore, deep Q-learning-based NNs lack interpretability, Nevertheless, in hierarchical and distributed wireless networks,
making it challenging to explain the rationale behind a specific due to the intrinsic lack or uncertainty of available information
action in response to a given state. Thus, in this realm of (that may include channel state, computational, and communi-
RL for wireless networks, a fundamental question emerges: cation resources available to interfering users) at each network
can offline data obtained from UL or DL observations be node, we need to look beyond causal RL. Herein, a possible
combined with online data from experimentation (via DT solution is the framework of MABs.
model simulations or causal generative models) to enhance 3) Causal bandit problem for real-time decision making in
the performance of a learning agent? Previous studies [27] wireless networks: MAB represents a category of sequential
addressed this problem using GANs, however, GANs primarily optimization problems wherein, given a collection of arms
learns the posterior distribution of the data based on the (actions), a player selects an arm in each round to obtain a
provided dataset and hence they are not generalizable. We reward. Traditional MAB algorithms cannot satisfy the ultra-
propose to overcome these major limitations by designing a synchronization demands of 6G wireless systems systems
causal RL framework which is an instance of causal inference, because of their inability to maintain distribution invariance
as discussed next. and their substantial need for extensive training data. In this
2) Causal RL for real-time resource management: RL using context, possessing causal knowledge in the form of an SCM
causal inference involves two key elements: experimentation M = (V, E, F, P (E)) enables wireless agents to acquire
and observation. In decision problems like POMDPs, causal knowledge about the distribution of non-intervened variables,
queries play a natural role where actions directly correspond p(v n | do(ζ) = I), where ζ represents the action (arm)
to interventions. One such example is determining the causal taken, I captures the intervened values, and v n ∈ V. One
effect of an action (intervention) on future rewards based of the variables in the set V represents the target variable
on past information about the POMDP process. We can Y . The objective is to minimize the cumulative regret R =
evaluate the causal query as ps (st+1 | s0→t , do(a0→t )), where T
(Y n − maxζ E[Y |I = ζ]). The expectation operator E is
P
do(a0→t ) can be generated by different possible distributions n=1
of at . By formulating decision problems as causal queries, applied to the policy distribution (over arms) and the non-
we can explore how actions influence outcomes and make intervened variables, which distinguishes it from traditional
informed decisions based on their expected causal effects. MAB approaches where only the pairs (ζ, Y ) are known.
Here, a particular set of causal queries may represent a specific Consequently, additional knowledge in the form of an SCM
13

ensures distribution invariance without requiring additional beneficial because a decision made at a higher network
training data as is the case with conventional MAB. layer may have ramifications across lower layers. This
MABs pose more significant challenges in multi-user sce- also helps perform interventions or counterfactuals in a
narios, such as opportunistic spectrum access within a meta- distributed manner. The causal effects of interventions
verse, where users need to collaborate to assess the key perfor- or counterfactuals performed in individual SCMs can be
mance indicator (KPI) needs of other users. Another example combined to achieve deductive reasoning, which provides
is the optimization of RIS placement and phase shifts. Herein, novel insights about network decisions made at different
it is essential to consider not only the causal relationships OSI layers.
within the environment but also the inter dependencies among One promising approach to realize the above benefits is to
users’ strategies. Likewise, when strategically evaluating the compute parallel policies for each subgoal using either multi-
consequences of an action or contemplating the actions of objective RL (MO-RL) [32] or causal reasoning games [26].
other users in varying circumstances, one must naturally After obtaining subgoal policies, the evaluator agent can use
consider both interventions and counterfactual scenarios. A supervised learning to distill these distributions into a new
promising approach to solve such multi-user MAB problems parameterized policy. These recommendations are then used
is to consider a merger of causality and game theory [26]. to execute autonomous actions through a closed-loop system
that incorporates advanced analytics, automation, and AI.
B. Causal Inference for Intent-based Wireless Networks This autonomous execution ensures that the network operates
Intent-based wireless networks aim to translate human in- efficiently, adapts to changing conditions, and achieves the
tentions (called business intent) into network configurations desired service KPIs without the need for manual intervention.
through automation and AI, improving network management,
efficiency, and responsiveness. One example of business intent
C. Causal Inference for Integrated Sensing and Communica-
involves optimizing wireless network sustainability while en-
tion (ISAC)
suring a consistent QoE for end-users. Despite industry interest
[30], the wireless literature lacks a comprehensive articulation In addition to the challenges related to resource efficiency
of intent-based networks. Here, we explore how causal in- and dynamic target tracking, as discussed in Section IV-D,
ference offers a promising approach to building intent-based ISAC applications also require effective management of non-
networks. Autonomous intent management is characterized by linear signal models and joint waveform design for sensing and
the following set of operations that form a closed loop. communication functionalities. Furthermore, they must enable
• Intent assurance: Once the service KPIs are defined, dynamic resource sharing between sensing and communication
measurement agents consistently oversee the network functions to leverage the benefits of integration and coordina-
variables, while assurance agents assess the network’s tion gains, as elaborated next.
compliance with its intended objectives. 1) Why is data-driven AI inadequate for achieving the
• Corrective actions: If the business goals have not been integration and coordination gains of ISAC?:
met, the node-level ML agents, known as proposal and • Nonlinear signal processing: ISAC functions may require
policy evaluator agents, are consulted to offer predic- full-duplex transmitters and receivers to operate. For
tions and recommendations. A conventional approach instance, the transmitter might be actively transmitting the
would involve using deep RL agents for both proposal ISAC waveform while, concurrently, the receiver captures
(RL actions) and evaluation (reward computation) tasks. the back-scattered or reflected signals for subsequent
However, as discussed in Section. IV, the proposals radar processing. Efficiently canceling self-interference
generated using this approach may lack interpretability, (SI) at the antenna, analog, or digital domains requires
necessitating retraining when the network environment or a substantial understanding of circuit non-linear models,
infrastructure undergoes changes. This poses a challenge antenna coupling, and time-varying channel characteris-
to the autonomous intent management functionality. tics. However, approximating such time-varying SI using
These closed-loop operations pave the way for constructing data-driven AI need not result in models that are accurate
a fully autonomous AI-native wireless network, provided that across time, or it could require recurrent retraining, which
the underlying AI agents do not require recurrent training introduces significant overhead.
and the network decisions are interpretable, a requirement • ISAC waveform design: One key objective of ISAC
that classical AI solutions often cannot fulfill. In contrast, systems is to use a single waveform for both radar
performing causal inference using a causal graph that describes and communication purposes. In such cases, the joint
the relations among network decision variables, which may waveform must be rigorously designed to fulfill com-
span various OSI layers, offers the following benefits: munication goals like data rates, latency, and QoE, all
• Performing self-reflection or self-critique: As discussed while preserving the desired sensing capabilities. This
in [31], this approach is only feasible if the ML model is becomes particularly challenging when: 1) dealing with
interpretable and explainable, which causality provides. different requirements for communication and sensing
• Decomposing intents into subgoals and performing de- systems, like peak-to-average power (PAPR) constraints
ductive reasoning: This can be accomplished by breaking for orthogonal frequency-division multiplexing (OFDM)
down a larger SCM into smaller SCMs. Splitting SCMs is systems and signal quality for sensing, 2) accounting for
14

hardware imperfections and non-linearities, and 3) man- delay and Doppler mean-squared error) at the sensing receiver
aging highly-mobile scenarios where radar and commu- and maximizing the user’s QoE for the communication re-
nication channels change rapidly. Classical AI struggles ceiver. The multi-objective optimization has to be performed
to adapt to dynamic (non-stationary) channels and non- under several constraints that may include transmit power and
linear signal models that are specific to different devices PAPR. By integrating causal inference into MO-RL problems,
or domains. Furthermore, addressing multi-objective op- it becomes possible to identify the minimal interventions
timizations presents substantial challenges in attaining needed to achieve specific objectives like maximizing QoE or
Pareto-optimal solutions [32]. Pareto optimality, in the sensing quality. The interventions here involve the number of
context of multi-objective optimization, refers to a state subcarriers or the sensing signal characteristics, which depend
where no further improvement can be made in one on exogenous variables such as environment characteristics
objective without worsening at least one other objective. (e.g., scatterer parameters as discussed in Section. II) that are
Because of the conflicting objectives and the large num- time-varying and the allocated spatial or spectrum resources.
ber of optimization variables related to communication Causal inference simplifies the solution space by estimating
and sensing, classical AI solutions may not achieve the minimal required interventions to be performed. Hence,
near-accurate Pareto-optimal solutions without extensive it allows computation of near Pareto-optimal solutions, sur-
training. passing the capabilities of data-driven MO-RL that may be
• Resource sharing between sensing and communication in stuck in a local optima. Here, causal inference can significantly
dynamic environments: Allocating wireless resources like improve the overall performance of ISAC systems compared to
time, frequency, and power efficiently between sensing generic waveforms that do not consider the causal dependency
and communication is challenging due to tradeoffs be- of user QoE or sensing quality on factors like environment
tween sensing quality and communication goals. ML can geometry, user location, mobility patterns, and hardware im-
enhance resource allocation by using prior observations perfections. To create resource-sharing algorithms that are
and context to optimize decisions. For instance, RNNs both generalizable and explainable, one can explore causal
can proactively predict user requests and RL can be used inference techniques such as causal RL and causal MAB, as
to explore resource optimization options. Nevertheless, earlier.
RNN cannot handle large input sequences, such as ex-
tended user request histories or contexts. Additionally, VI. R ECOMMENDATIONS
RL is often constrained by the dataset it was trained This article laid out a bold new vision for designing AI-
on, limiting its adaptability to evolving, non-stationary native wireless systems (6G and beyond) founded on the
conditions or novel scenarios. principles of causal reasoning. We conclude with three key
recommendations:
2) How can causal inference address nonlinearity, joint • Causal Reasoning as a Bedrock for AI-Native Wireless
waveform design, and optimize real-time resource sharing in Networks: Designing AI-native wireless networks is not
ISAC systems? To address the inherent nonlinearity of SI sig- going to be successful if it relies simply on reusing
nals, a promising strategy involves leveraging causal discovery classical AI tools like autoencoders and ANNs to mimic
to gain insights into the underlying “physics” governing the existing wireless functions. Instead, there is a need to
data. The generation of these signals, influenced by hardware develop novel AI frameworks that incorporate the de-
imperfections and the SI channel, is not random; it is guided by sirable properties of explainability, reasoning, generaliz-
specific physical principles (electromagnetic as well as circuit ability, and sustainability. In this regard, we recommend
theory-based). By constructing a causal graph that captures embracing causal reasoning as the cornerstone of AI-
relationships among channel parameters, electronics hardware native wireless systems as it enables several of these
characteristics, and antenna coupling coefficients, the causal key characteristics. Causal reasoning can be integrated
model can be extended to accommodate various distributions across the networking stack, including 1) source data
of channels, frequency bands, or antenna characteristics by sampling (selecting causal variables for extraction), 2)
applying interventional and counterfactual causal inference data encoding (using causal representations), 3) schedul-
techniques mentioned in this section. ing (incorporating causality awareness into user data and
Causal MO-RL can efficiently handle multi-objective opti- KPIs for resource management), and 4) communication
mization in ISAC waveform design. For example, consider (users seeking relevant interventions or counterfactuals to
that sensing and communication signals occupy different gain insights about network attributes).
subcarriers within an OFDM symbol under a multi-antenna • Refined Models with Causality to Mitigate Bias: We
transmitter. In this scenario, the optimization variables could advocate transitioning away from excessive reliance on
include finding the optimal number of subcarriers, the complex LLMs and extensive data, characteristic of data-driven
coefficients for the sensing subcarriers, and the beampatterns networking with foundation models in general. Instead,
for radar and communication signals. Radar signals may have we emphasize the importance of refining models and
a wider beamwidth requirement (since scatterers may be features using causality to mitigate potential biases within
omnipresent) in contrast to communication users’ narrower AI frameworks. In other words, we recommend design-
beam requirements. These variables are computed with the ing causality-based foundation models that can lead to
dual objective of minimizing the target detection error (e.g., interpretable and lighter ML models, which are crucial
15

for achieving next-generation wireless network goals as [19] C. Studer, S. Medjkouh, E. Gonultaş, T. Goldstein, and O. Tirkkonen,
shown in Fig. 1. “Channel charting: Locating users within the radio environment using
channel state information,” IEEE Access, vol. 6, pp. 47682–47698, 2018.
• Scalable AI-Native Wireless Networks using Layer- [20] R-J. Reifert, S. Roth, A. A. Ahmad, and A. Sezgin, “Comeback Kid:
wise SCMs: Causal ML comes with its own set of Resilience for Mixed-Critical Wireless Network Resource Management,”
IEEE Transactions on Vehicular Technology, Jul. 2023.
challenges. The profusion of network variables spanning [21] N. NaderiAlizadeh, M. Eisen, and A. Ribeiro, “Learning Resilient Radio
multiple wireless networking layers can result in intricate Resource Management Policies with Graph Neural Networks,” IEEE
Transactions on Signal Processing, vol. 71, pp. 995–1009, 2023.
and expanding causal relationships, particularly as the [22] C. K. Thomas and W. Saad, “Neuro-Symbolic Causal Reasoning
network grows in scale. In this context, we recommend Meets Signaling Game for Emergent Semantic Communications,” arxiv
preprint arXiv:2210.12040, 2022.
devising a path to scale causality-aware AI-native wire- [23] T. Deleu, A. Göis, C. Emezue, M. Rankawat, S. Lacoste-Julien, S. Bauer,
less networks without disrupting existing 3GPP archi- and Y. Bengio, “Bayesian Structure Learning with Generative Flow
Networks,” arXiv preprint arXiv:2202.13903, Feb. 2022.
tectures. A practical strategy to address this scalability [24] M. D. Hoffman, D. M. Blei, C. Wong, and J. Paisley, “Stochastic
challenge involves constructing layer-specific SCMs and Variational Inference,” Journal of Machine Learning Research, vol. 14,
pp. 1303–1347, May 2013.
devising causality-aware algorithms tailored to specific [25] N. S. Kovach, A. S. Gibson, and G. B. Lamont, “Hypergame Theory:
OSI layers. This distributed architecture can also align a Model for Conflict, Misperception, and Deception ,” Game Theory,
vol. 2015, 2015.
with the open RAN architecture of 5G and beyond, [26] L. Hammond, J. Fox, T. Everitt, R. Carey, A. Abate, and M. Wooldridge,
enabling causal reasoning across individual layers dis- “Reasoning about Causality in Games ,” Artificial Intelligence, vol. 320,
July 2023.
tributed across different network components or imple- [27] A. T. Z. Kasgari, W. Saad, M. Mozaffari, and H. V. Poor, “Experienced
mented as virtualized network functions. Deep Reinforcement Learning with Generative adversarial Networks
(GANs) for Model-free Ultra Reliable Low Latency Communication,”
IEEE Transactions on Communications, vol. 69, pp. 884–899, 2020.
R EFERENCES [28] M. Bain and C. Sammut, “A Framework for Behavioural Cloning,”
Machine Intelligence, pp. 103–129, July 1995.
[1] X. Lin, “Artificial Intelligence in 3GPP 5G-Advanced: A Survey,” arXiv [29] M. Laskin, L. Wang, J. Oh, E. Parisotto, S. Spencer, R. Steigerwald,
preprint arXiv:2305.05092, 2023. D. J. Strouse, S. Hansen, A. Filos, E. Brooks, and M. Gazeau, “In-
[2] C. Zhang, P. Patras, and H. Haddadi, “Deep Learning in Mobile and context Reinforcement Learning with Algorithm Distillation.,” arXiv
Wireless Networking: A Survey ,” IEEE Communications Surveys and preprint arXiv:2210.14215., 2022.
Tutorials, vol. 21, no. 3, pp. 2224–2287, 2019. [30] P. H. Gomes, M. Buhrgard, J. Harmatos, S. K. Mohalik, D. Roeland, and
[3] Z. Chen and B. Liu, “Lifelong Machine Learning,” in San Rafael: J. Niemöller, “Intent-driven Closed Loops for Autonomous Networks,”
Morgan and Claypool Publishers, 2018, vol. 1. Journal of ICT Standardization, vol. 9, pp. 257–290, 2021.
[4] F. Le, M. Srivatsa, R. Ganti, and V. Sekar, “Rethinking Data-driven [31] J. S. Park, J. C. O’Brien, J. C. Carrie, M. R. Morris, P. Liang, and
Networking with Foundation Models: Challenges and Opportunities ,” M. S. Bernstein, “Generative Agents: Interactive Simulacra of Human
Proceedings of the 21st ACM Workshop on Hot Topics in Networks, pp. Behavior,” arXiv preprint arXiv:2304.03442, 2023.
188–197, 2022. [32] A. Abdolmaleki, S. Huang, G. Vezzani, B. Shahriari, J. T. Springenberg,
[5] R. Aiyappa, J. An, H. Kwak, and Y. Y. Ahn, “Can We Trust the S. Mishra, and D. Tirumala, “On Multi-objective Policy Optimization
Evaluation on ChatGPT?,” arXiv preprint arXiv:2303.12767, 2023. as a Tool for Reinforcement Learning: Case Studies in Offline RL and
[6] J. Pearl and D. Mackenzie, “The Book of Why,” in Basic Books, New Finetuning,” arXiv preprint arXiv:2106.08199, 2021.
York, 2018.
[7] B. Schölkopf, F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner,
A. Goyal, and Y. Bengio, “Toward Causal Representation Learning,”
Proceedings of the IEEE, vol. 109, no. 5, Feb. 2021.
[8] D. Tse and P. Viswanath, “Fundamentals of Wireless Communication,”
in Cambridge University Press, 2005.
[9] C. Chaccour, M. N. Soorki, W. Saad, M. Bennis, and P. Popovski, “Can
Terahertz Provide High-Rate Reliable Low-Latency Communications for
Wireless VR,” IEEE Internet of Things Journal, vol. 9, no. 12, pp. 9712–
9729, Jun. 2022.
[10] L. Yuan, Z. Fu, L. Zhou, K. Yang, and S-C Zhu, “Emergence of
Theory of Mind Collaboration in Multiagent Systems,” arXiv preprint
arXiv:2110.00121, pp. 424–438, 2021.
[11] P. Madumal, T. Miller, L. Sonenberg, and F. Vetere, “Explainable
Reinforcement Learning Through a Causal Lens,” in Proceedings of
the Association for the Advancement of Artificial Intelligence (AAAI)
Conference on Artificial Intelligence, 2020, vol. 34, pp. 2493–2500.
[12] V. Aglietti, X. Lu, A. Paleyes, and J. González, “Causal Bayesian
Optimization,” in Proceedings of the 23rd International Conference
on Artificial Intelligence and Statistics (AISTATS), Jun. 2020, pp. 3155–
3164.
[13] L. Bariah, H. Zou, Q. Zhao, B. Mouhouche, F. Bader, and M. Debbah,
“Understanding Telecom Language Through Large Language Models ,”
arxiv preprint arXiv:2306.07933, 2023.
[14] C. Chaccour, W. Saad, M. Debbah, Z. Han, and H. V. Poor, “Less Data,
More Knowledge: Building Next Generation Semantic Communication
Networks,” arXiv preprint arXiv:2211.14343, Nov. 2022.
[15] C. K. Thomas and W. Saad, “Reliable Beamforming at Terahertz Bands:
Are Causal Representations the Way Forward,” in Proceedings of IEEE
International Conference on Acoustics, Speech, and Signal Processing
(ICASSP), Jun. 2023.
[16] S. Löwe, D. Madras, R. Zemel, and M. Welling, “Amortized Causal
Discovery: Learning to Infer Causal Graphs from Time-Series Data,”
Proceedings of Machine Learning Research, vol. 140, pp. 509–525, Jun.
2022.
[17] C. Ruah, O. Simeone, and B. Al-Hashimi, “A Bayesian Framework
for Digital Twin-Based Control, Monitoring, and Data Collection in
Wireless Systems,” arXiv preprint arXiv:2212.01351, 2023.
[18] C. K. Thomas, W. Saad, and Y. Xiao, “Causal Semantic Communication
for Digital Twins: A Generalizable Imitation Learning Approach,” arxiv
preprint arXiv:2304.12502, 2023.

You might also like