Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

AI Trends of May 2023

You Need to Know


Summary of latest AI trends for May 2023

Gonzalo Recio · Follow


8 min read · 4 days ago

Listen OpenShare
in app More

Photo by Maxim Berg on Unsplash

As we enter May 2023, AI continues to


evolve at a breakneck pace and impact our
lives in new ways. In this article, we’ll
explore and provide a summarized
overview of the latest AI trends and
breakthroughs in natural language
processing, computer vision, and machine
learning.

Whether you’re an AI enthusiast,


researcher, or just curious about
technology’s future, read on for insights
into the latest developments. Let’s go!

NLP & Large Language Models


AutoGPT, AgentGPT: LLMs as autonomous
agents!
A new paradigm of using LLMs for goal-
oriented planning. The idea it to make
LLMs to reach goals autonomously by
making them think about which tasks they
have to do and by learning from their
intermediate results, and perferming self-
improvement iterations. Still on research
but a very interesting line to explore.

In the following video you can get an idea


of the capabilities of this technology:

AutoGPT: This Is ChatGP…

“Two Minute Papers — AutoGPT: This Is ChatGPT


Supercharged!”

Sources: Auto-GPT, AgentGPT, Reflexion: an


autonomous agent with dynamic memory
and self-reflectionx

Google I/O Keynote: More models!


While Geoffrey Hinton leaves Google
warning about the risks of AI, Google I/O
keynote takes place on 10th May,
presenting many LLM updates and new
features with Generative AI. Remarkable
announcements were:

Announcement of PaLM 2, the new


foundation model for many AI tools,
which will be available to use via API.
They presented 4 model sizes (the
smallest fits in a mobile devices).

Google’s Bard is now available in some


countries and has advanced features
like math, reasoning, coding, and will
support image prompts. It will soon
integrate Adobe Firefly for generating
high-quality images. For now, it is only
available in English, Japanese and
Korean laguages.

Release of 3 new models in Vertex AI:


Codey (a new code completion model,
the “GitHub Copilot competitor”),
Imagen (text-to-image model), and
Chirp (speech-to-text).

Gemini: Multimodal fountation model


with memory and planning. Still in
traning phase.

Duet AI: Google’s new AI tools for


Docs, Sheets, Slides, Meet, and Gmail.
Analogous to Microsoft Copilot 365.
Currently on waitlist.

Google Keynote (Google I…

Google I/O Keynote 2023

StarCoder : A new State-of-the-Art LLM for


Code
New 16B-parameter foundation model for
code by BigCode, trained on non-licenced
code. Paper authors claim that StarCoder
and StarCoderBase outperform the largest
models, including PaLM, LaMDA, and
LLaMA, despite being significantly
smaller. They also outperform CodeGen-
16B-Mono and OpenAI’s code-cushman-
001 (12B) model . They made available a
playground and a Visual Studio Code
plugin to use it as a programming copilot.
Sources: Blog, Paper

Comparison with other models


(https://arxiv.org/pdf/2305.06161.pdf)

HuggingFace : Open-source strikes back


HuggingFace (the democratizers of
Transformer models), presented a couple
of new interesting features: HuggingChat
and HuggingFace Transformers Agents.

HuggingChat: open-source alternative to


ChatGPT. Still on early development with
only one LLM option available to use, but
can be a great alternative in the future:

HuggingChat
The first open source
alternative to…
ChatGPT.
huggingface.co

HuggingChat screenshot.

HuggingFace Tranformer Agents: we are


getting closer to having something like
Iron Man’s JARVIS AI assistant.
HuggingFace presented an experimental
API to build multimodal agents based on a
LLM. These agents are able to orquestrate
different Generative AI models, thus, they
are capable of generating and interacting
with images, audio, and text in a
multimodal way . Check it out here,
worth taking a look:

Transformers Agent
Transformers Agent is
an experimental API…
which is subject to
huggingface.co
change at any time.
Results returned by
the agents can…

Tools Generation
RestrictedPython
Tools:
•image_generator
interpreter
Instruction
•image_captioner
Readoutloudthe "Ariverflowing
contentoftheimage throughafrozen
forest"

Prompt
caption=image_captioner(image)
Twillaskyoutoperformatask,yourjobistocomeupwitha audio=texttospeech(caption)
seriesofsimplecommandsinPythonthatwillperformthe
task

Youcanprintintermediateresultsifitmakessensetodoso.

Tools:
•image_generator:Thisisatoolthatgeneratesanimage Agent
•image_captioner:Thisisatoolthatcaptionsanimage
text_to_speech:Convertsthetexttoaudio

«Examplesoftasks>
willusetheimage_captionertocaptiontheimage
Task:"Readoutloudthecontentoftheimage" andthetext_tospeechtoreaditoutloud.

HuggingFace Transformer Agents execution flow


example.

HuggingFace Transformer Agents operate


by deciding which tool to use depending
on the user prompt. Here is an updated list
of the tools they have integrated in
transformers agents framework:

Document question answering: given


a document (such as a PDF) in image
format, answer a question on this
document (Donut)

Text question answering: given a long


text and a question, answer the
question in the text (Flan-T5)

Unconditional image captioning:


Caption the image! (BLIP)

Image question answering: given an


image, answer a question on this
image (VILT)

Image segmentation: given an image


and a prompt, output the segmentation
mask of that prompt (CLIPSeg)

Speech to text: given an audio


recording of a person talking,
transcribe the speech into text
(Whisper)

Text to speech: convert text to speech


(SpeechT5)

Zero-shot text classification: given a


text and a list of labels, identify to
which label the text corresponds the
most (BART)

Text summarization: summarize a


long text in one or a few sentences
(BART)

Translation: translate the text into a


given language (NLLB)

Claude LLMs released


Claude, a next-generation AI assistant
based on Anthropic’s research into
training helpful, honest, and harmless AI
systems. Here there a comparison
between Claude, GPT-4 and 3.5 and PaLM
2 (chat-bison). Impressive inference speed
and quality accuracy!

Source: https://github.com/kagisearch/pyllms

Prompt Engineering free course by


DeepLeraningAI & OpenAI
Free course of ~2h duration with
interactive python notebooks on how to
make good prompts by Andrew Ng and Isa
Fulford (OpenAI engineer). Highly
recommended! Course link below:

ChatGPT Prompt
Engineering for…
Developers
In ChatGPT Prompt
Engineering for…
Developers, you will
www.deeplearning.ai
learn how to use a
large language model
(LLM) to quickly
Bing Chat Update: Images, videos & pluguins!
build…
BingChat now includes images and videos
in replies and chat history, and in the near
future it will support multimodal support
and plugin usage among many other
things. Take a look at the following link to
see what is to come:

Announcing the next


wave of AI innovatio…
with Microsoft Bing
Just three months
and
ago, Edge - The the…
we unveiled
Official Microsoft…
new AI-powered
blogs.microsoft.com
Microsoft Bing and
Edge to reinvent the
future of search with…
Clone voices with the new ElevenLabs models
ElevenLabs has released a text-to-speech
model (convert text to synthesized speech)
capable of generating professional sound
and even allows you to clone your own
voice from a short audio of you speaking
to different languages.
https://beta.elevenlabs.io/

Computer Vision
Midjourney 5.1
New version of the model released, with
significant improvements in prompt
unsderstanding, sharpness of the images,
as well as reduction of borders and
unwanted text artifacts.

Midjourney
@midjourney · Follow

V5.1 is now available! Images are more


coherent, sharp, and beautiful. It's easier to use
and should respond more precisely to
instructions. We've also added a "RAW" mode
for expert users to reduce the 'opinionatedness'
of our model and give you more creative
control.
10:20 PM · May 3, 2023

2.9K Reply Share

Read 133 replies

Some examples of the results obtained


with versions 5 and 5.1:

“close up of a child wearing swimming goggles”


Midjourney 5 (left) vs 5.1 (right). Credit to Barry Collins:
https://www.forbes.com/sites/barrycollins/2023/05/03
/midjourney-51-arrivesand-its-another-leap-forward-
for-ai-art/

DeepFloyd IF
Stability AI releases DeepFloyd IF, a
powerful text-to-image model that can
smartly integrate text into images.
Incorporating the large language model
T5-XXL-1.1 as a text encoder, DeepFloyd IF
generates coherent and clear text
alongside objects of different properties.

DeepFloyd IF image generated samples with text.

ImageBind by Meta: One embedding space to


bind them all
Another open-source model from Meta AI!
This time a multimodal foundation model
capable of processing and generating
embeddings for multimodal data (text,
audio, images, videos, depth maps,
heatmaps, etc.). And, what is it capable of
and what can it be used for? Well, it can
generate images from audio, search for
audio from text, search for audio from
images, etc, all with the same multimodal
model. Better see the demo that they have
prepared to understand it!
https://imagebind.metademolab.com/dem
o

Yann LeCun
@ylecun · Follow

IMAGEBIND: One Embedding Space To Bind


Them All.

Learns a joint embedding across six different


modalities - images, text, audio, depth, thermal,
and IMU data.

An open source project by Meta-FAIR.

Paper: dl.fbaipublicfiles.com/imagebind/imag…
Demo: imagebind.metademolab.com
Code:… Show more

1:08 AM · May 10, 2023

2.3K Reply Share

Read 48 replies

Here’s an overview of the different data


sources that ImageBind can process:

ImageBind: https://arxiv.org/abs/2305.05665

YOLO-NAS
After the recent release of YOLOv8 by
Ultralytics, Deci.ai presents a new
Foundation Object Detection Model
providing Production-Ready performance.
It significantly improves inference
performance while preserving detection
accuracy.

Comparison of different YOLO models accuracy and


inference latency.

Trending tools and repositories


Pandas AI: Making dataframes conversational
Pandas AI is a Python library that adds
generative AI and LLM capabilities to
Pandas, enabling conversational data
analysis and manipulation of dataframes.
For example, you can ask PandasAI to
perform more complex queries or also to
draw plots:

pandas_ai.run(
df,
"Plot the histogram of countries showi
)

le13 GDPbyCountry
2.00

1.75-

1.50

1.25

1.00
0.75

0.50

0.25

0.00
UnitedStates-

UnitedKingdom.

Japan.
Australia
Spain

Canada
Italy
Germany

China
France

Country

https://github.com/gventuri/pandas-ai

Shap-E: Generaring 3D objectes condicioned


on text or images
Official code repository and model release
for Shap-E: Generating Conditional 3D
Implicit Functions by OpenAI, a
conditional generative model for 3D assets
conditioned on text or images.

Achairthatlooks Anairplanethatlooks
Aspaceship
likeanavocado likeabanana

Achairthatlooks
Abirthdaycupcake Agreenboot
likeatree

Apenquin Ubeicecreamcone Abowlofvegetables

3D generated examples from the from our text-


conditional model.

Mojo
Mojo — a new programming language
for all AI developers that combines the
usability of Python with the performance
of C, unlocking unparalleled
programmability of AI hardware and
extensibility of AI models.

Performance benchmark on Mandelbrot algorithm.

Mindblowing research
AI can read minds?
Over the last months, a new method based
on a diffusion model (DM) was proposed
to reconstruct images from human brain
activity obtained via functional magnetic
resonance imaging (fMRI). Recently,
researchers from the University of Texas at
Austin introduced a non-invasive decoder
that reconstructs continuous language
from fMRIs too. These findings
demonstrate the viability of brain–
computer interfaces to enhace human
interactions with machines.

Image reconstruction from fMRI (left) and text reconstruction


from fMRI (right)

Sources: High-resolution image


reconstruction with latent diffusion
models from human brain activity,
Semantic reconstruction of continuous
language from non-invasive brain
recordings.

That’s all for May 2023 AI trends summary.


Stay tunned to see what further
breakthroughs and trends will come our
way in the months and years ahead!

AI Deep Learning NLP

You might also like