AI Trends of May 2023 You Need To Know by Gonzalo Recio Medium

AI Trends of May 2023
You Need to Know

Summary of latest AI trends for May 2023
Gonzalo Recio · Follow

8 min read · 4 days ago
Listen OpenShare
in app More
Photo by Maxim Berg on Unsplash
As we enter May 2023, AI continues to

evolve at a breakneck pace and impact our
lives in new ways. In this article, we’ll
explore and provide a summarized
overview of the latest AI trends and
breakthroughs in natural language
processing, computer vision, and machine
learning.
Whether you’re an AI enthusiast,

researcher, or just curious about
technology’s future, read on for insights
into the latest developments. Let’s go!
NLP & Large Language Models

AutoGPT, AgentGPT: LLMs as autonomous
agents!
A new paradigm of using LLMs for goal-
oriented planning. The idea it to make
LLMs to reach goals autonomously by
making them think about which tasks they
have to do and by learning from their
intermediate results, and perferming self-
improvement iterations. Still on research
but a very interesting line to explore.
In the following video you can get an idea

of the capabilities of this technology:
AutoGPT: This Is ChatGP…
“Two Minute Papers — AutoGPT: This Is ChatGPT

Supercharged!”
Sources: Auto-GPT, AgentGPT, Reflexion: an

autonomous agent with dynamic memory
and self-reflectionx
Google I/O Keynote: More models!

While Geoffrey Hinton leaves Google
warning about the risks of AI, Google I/O
keynote takes place on 10th May,
presenting many LLM updates and new
features with Generative AI. Remarkable
announcements were:
Announcement of PaLM 2, the new

foundation model for many AI tools,
which will be available to use via API.
They presented 4 model sizes (the
smallest fits in a mobile devices).
Google’s Bard is now available in some

countries and has advanced features
like math, reasoning, coding, and will
support image prompts. It will soon
integrate Adobe Firefly for generating
high-quality images. For now, it is only
available in English, Japanese and
Korean laguages.
Release of 3 new models in Vertex AI:

Codey (a new code completion model,
the “GitHub Copilot competitor”),
Imagen (text-to-image model), and
Chirp (speech-to-text).
Gemini: Multimodal fountation model

with memory and planning. Still in
traning phase.
Duet AI: Google’s new AI tools for

Docs, Sheets, Slides, Meet, and Gmail.
Analogous to Microsoft Copilot 365.
Currently on waitlist.
Google Keynote (Google I…
Google I/O Keynote 2023
StarCoder : A new State-of-the-Art LLM for

Code
New 16B-parameter foundation model for
code by BigCode, trained on non-licenced
code. Paper authors claim that StarCoder
and StarCoderBase outperform the largest
models, including PaLM, LaMDA, and
LLaMA, despite being significantly
smaller. They also outperform CodeGen-
16B-Mono and OpenAI’s code-cushman-
001 (12B) model . They made available a
playground and a Visual Studio Code
plugin to use it as a programming copilot.
Sources: Blog, Paper
Comparison with other models

(https://arxiv.org/pdf/2305.06161.pdf)
HuggingFace : Open-source strikes back

HuggingFace (the democratizers of
Transformer models), presented a couple
of new interesting features: HuggingChat
and HuggingFace Transformers Agents.
HuggingChat: open-source alternative to

ChatGPT. Still on early development with
only one LLM option available to use, but
can be a great alternative in the future:
HuggingChat
The first open source
alternative to…
ChatGPT.
huggingface.co
HuggingChat screenshot.
HuggingFace Tranformer Agents: we are

getting closer to having something like
Iron Man’s JARVIS AI assistant.
HuggingFace presented an experimental
API to build multimodal agents based on a
LLM. These agents are able to orquestrate
different Generative AI models, thus, they
are capable of generating and interacting
with images, audio, and text in a
multimodal way . Check it out here,
worth taking a look:
Transformers Agent
Transformers Agent is
an experimental API…
which is subject to
huggingface.co
change at any time.
Results returned by
the agents can…
Tools Generation
RestrictedPython
Tools:
•image_generator
interpreter
Instruction
•image_captioner
Readoutloudthe "Ariverflowing
contentoftheimage throughafrozen
forest"
Prompt
caption=image_captioner(image)
Twillaskyoutoperformatask,yourjobistocomeupwitha audio=texttospeech(caption)
seriesofsimplecommandsinPythonthatwillperformthe
task
Youcanprintintermediateresultsifitmakessensetodoso.
Tools:
•image_generator:Thisisatoolthatgeneratesanimage Agent
•image_captioner:Thisisatoolthatcaptionsanimage
text_to_speech:Convertsthetexttoaudio
«Examplesoftasks>
willusetheimage_captionertocaptiontheimage
Task:"Readoutloudthecontentoftheimage" andthetext_tospeechtoreaditoutloud.
HuggingFace Transformer Agents execution flow

example.
HuggingFace Transformer Agents operate

by deciding which tool to use depending
on the user prompt. Here is an updated list
of the tools they have integrated in
transformers agents framework:
Document question answering: given

a document (such as a PDF) in image
format, answer a question on this
document (Donut)
Text question answering: given a long

text and a question, answer the
question in the text (Flan-T5)
Unconditional image captioning:

Caption the image! (BLIP)
Image question answering: given an

image, answer a question on this
image (VILT)
Image segmentation: given an image

and a prompt, output the segmentation
mask of that prompt (CLIPSeg)
Speech to text: given an audio

recording of a person talking,
transcribe the speech into text
(Whisper)
Text to speech: convert text to speech

(SpeechT5)
Zero-shot text classification: given a

text and a list of labels, identify to
which label the text corresponds the
most (BART)
Text summarization: summarize a

long text in one or a few sentences
(BART)
Translation: translate the text into a

given language (NLLB)
Claude LLMs released

Claude, a next-generation AI assistant
based on Anthropic’s research into
training helpful, honest, and harmless AI
systems. Here there a comparison
between Claude, GPT-4 and 3.5 and PaLM
2 (chat-bison). Impressive inference speed
and quality accuracy!
Source: https://github.com/kagisearch/pyllms
Prompt Engineering free course by

DeepLeraningAI & OpenAI
Free course of ~2h duration with
interactive python notebooks on how to
make good prompts by Andrew Ng and Isa
Fulford (OpenAI engineer). Highly
recommended! Course link below:
ChatGPT Prompt
Engineering for…
Developers
In ChatGPT Prompt
Engineering for…
Developers, you will
www.deeplearning.ai
learn how to use a
large language model
(LLM) to quickly
Bing Chat Update: Images, videos & pluguins!
build…
BingChat now includes images and videos
in replies and chat history, and in the near
future it will support multimodal support
and plugin usage among many other
things. Take a look at the following link to
see what is to come:
Announcing the next

wave of AI innovatio…
with Microsoft Bing
Just three months
and
ago, Edge - The the…
we unveiled
Official Microsoft…
new AI-powered
blogs.microsoft.com
Microsoft Bing and
Edge to reinvent the
future of search with…
Clone voices with the new ElevenLabs models
ElevenLabs has released a text-to-speech
model (convert text to synthesized speech)
capable of generating professional sound
and even allows you to clone your own
voice from a short audio of you speaking
to different languages.
https://beta.elevenlabs.io/
Computer Vision
Midjourney 5.1
New version of the model released, with
significant improvements in prompt
unsderstanding, sharpness of the images,
as well as reduction of borders and
unwanted text artifacts.
Midjourney
@midjourney · Follow
V5.1 is now available! Images are more

coherent, sharp, and beautiful. It's easier to use
and should respond more precisely to
instructions. We've also added a "RAW" mode
for expert users to reduce the 'opinionatedness'
of our model and give you more creative
control.
10:20 PM · May 3, 2023
2.9K Reply Share
Read 133 replies
Some examples of the results obtained

with versions 5 and 5.1:
“close up of a child wearing swimming goggles”

Midjourney 5 (left) vs 5.1 (right). Credit to Barry Collins:
https://www.forbes.com/sites/barrycollins/2023/05/03
/midjourney-51-arrivesand-its-another-leap-forward-
for-ai-art/
DeepFloyd IF
Stability AI releases DeepFloyd IF, a
powerful text-to-image model that can
smartly integrate text into images.
Incorporating the large language model
T5-XXL-1.1 as a text encoder, DeepFloyd IF
generates coherent and clear text
alongside objects of different properties.
DeepFloyd IF image generated samples with text.
ImageBind by Meta: One embedding space to

bind them all
Another open-source model from Meta AI!
This time a multimodal foundation model
capable of processing and generating
embeddings for multimodal data (text,
audio, images, videos, depth maps,
heatmaps, etc.). And, what is it capable of
and what can it be used for? Well, it can
generate images from audio, search for
audio from text, search for audio from
images, etc, all with the same multimodal
model. Better see the demo that they have
prepared to understand it!
https://imagebind.metademolab.com/dem
o
Yann LeCun
@ylecun · Follow
IMAGEBIND: One Embedding Space To Bind

Them All.
Learns a joint embedding across six different

modalities - images, text, audio, depth, thermal,
and IMU data.
An open source project by Meta-FAIR.
Paper: dl.fbaipublicfiles.com/imagebind/imag…
Demo: imagebind.metademolab.com
Code:… Show more
1:08 AM · May 10, 2023
2.3K Reply Share
Read 48 replies
Here’s an overview of the different data

sources that ImageBind can process:
ImageBind: https://arxiv.org/abs/2305.05665
YOLO-NAS
After the recent release of YOLOv8 by
Ultralytics, Deci.ai presents a new
Foundation Object Detection Model
providing Production-Ready performance.
It significantly improves inference
performance while preserving detection
accuracy.
Comparison of different YOLO models accuracy and

inference latency.
Trending tools and repositories

Pandas AI: Making dataframes conversational
Pandas AI is a Python library that adds
generative AI and LLM capabilities to
Pandas, enabling conversational data
analysis and manipulation of dataframes.
For example, you can ask PandasAI to
perform more complex queries or also to
draw plots:
pandas_ai.run(
df,
"Plot the histogram of countries showi
)
le13 GDPbyCountry
2.00
1.75-
1.50
1.25
1.00
0.75
0.50
0.25
0.00
UnitedStates-
UnitedKingdom.
Japan.
Australia
Spain
Canada
Italy
Germany
China
France
Country
https://github.com/gventuri/pandas-ai
Shap-E: Generaring 3D objectes condicioned

on text or images
Official code repository and model release
for Shap-E: Generating Conditional 3D
Implicit Functions by OpenAI, a
conditional generative model for 3D assets
conditioned on text or images.
Achairthatlooks Anairplanethatlooks
Aspaceship
likeanavocado likeabanana
Achairthatlooks
Abirthdaycupcake Agreenboot
likeatree
Apenquin Ubeicecreamcone Abowlofvegetables
3D generated examples from the from our text-

conditional model.
Mojo
Mojo — a new programming language
for all AI developers that combines the
usability of Python with the performance
of C, unlocking unparalleled
programmability of AI hardware and
extensibility of AI models.
Performance benchmark on Mandelbrot algorithm.
Mindblowing research
AI can read minds?
Over the last months, a new method based
on a diffusion model (DM) was proposed
to reconstruct images from human brain
activity obtained via functional magnetic
resonance imaging (fMRI). Recently,
researchers from the University of Texas at
Austin introduced a non-invasive decoder
that reconstructs continuous language
from fMRIs too. These findings
demonstrate the viability of brain–
computer interfaces to enhace human
interactions with machines.
Image reconstruction from fMRI (left) and text reconstruction

from fMRI (right)
Sources: High-resolution image

reconstruction with latent diffusion
models from human brain activity,
Semantic reconstruction of continuous
language from non-invasive brain
recordings.
That’s all for May 2023 AI trends summary.

Stay tunned to see what further
breakthroughs and trends will come our
way in the months and years ahead!
AI Deep Learning NLP

AI Trends of May 2023 You Need To Know by Gonzalo Recio Medium

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI Trends of May 2023 You Need To Know by Gonzalo Recio Medium

Uploaded by

Copyright:

Available Formats

AI Trends of May 2023

You Need to Know

Gonzalo Recio · Follow

Photo by Maxim Berg on Unsplash

As we enter May 2023, AI continues to

Whether you’re an AI enthusiast,

NLP & Large Language Models

In the following video you can get an idea

AutoGPT: This Is ChatGP…

“Two Minute Papers — AutoGPT: This Is ChatGPT

Sources: Auto-GPT, AgentGPT, Reflexion: an

Google I/O Keynote: More models!

Announcement of PaLM 2, the new

Google’s Bard is now available in some

Release of 3 new models in Vertex AI:

Gemini: Multimodal fountation model

Duet AI: Google’s new AI tools for

Google Keynote (Google I…

Google I/O Keynote 2023

StarCoder : A new State-of-the-Art LLM for

Comparison with other models

HuggingFace : Open-source strikes back

HuggingChat: open-source alternative to

HuggingFace Tranformer Agents: we are

HuggingFace Transformer Agents execution flow

HuggingFace Transformer Agents operate

Document question answering: given

Text question answering: given a long

Unconditional image captioning:

Image question answering: given an

Image segmentation: given an image

Speech to text: given an audio

Text to speech: convert text to speech

Zero-shot text classification: given a

Text summarization: summarize a

Translation: translate the text into a

Claude LLMs released

Prompt Engineering free course by

Announcing the next

V5.1 is now available! Images are more

2.9K Reply Share

Read 133 replies

Some examples of the results obtained

“close up of a child wearing swimming goggles”

DeepFloyd IF image generated samples with text.

ImageBind by Meta: One embedding space to

IMAGEBIND: One Embedding Space To Bind

Learns a joint embedding across six different

An open source project by Meta-FAIR.

1:08 AM · May 10, 2023

2.3K Reply Share

Here’s an overview of the different data

Comparison of different YOLO models accuracy and

Trending tools and repositories

Shap-E: Generaring 3D objectes condicioned

Apenquin Ubeicecreamcone Abowlofvegetables

3D generated examples from the from our text-

Performance benchmark on Mandelbrot algorithm.

Image reconstruction from fMRI (left) and text reconstruction

Sources: High-resolution image

That’s all for May 2023 AI trends summary.

AI Deep Learning NLP