Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Not logged in Talk Contributions Create account Log in

Article Talk Read Edit View history Search Wikipedia

Stable Diffusion
From Wikipedia, the free encyclopedia

Main page Stable Diffusion is a deep learning, text-to-image model released in 2022. It is
Stable Diffusion
Contents primarily used to generate detailed images conditioned on text descriptions, though
Current events it can also be applied to other tasks such as inpainting, outpainting, and generating
Random article
image-to-image translations guided by a text prompt.[3]
About Wikipedia
Contact us Stable Diffusion is a latent diffusion model, a variety of deep generative neural
Donate network developed by the CompVis group at LMU Munich.[4] The model has been
released by a collaboration of Stability AI, CompVis LMU, and Runway with support
Contribute
from EleutherAI and LAION.[5][1][6] In October 2022, Stability AI raised $101 million
Help
dollars in a round led by Lightspeed Ventures and Coatue. [7]
Learn to edit
Community portal Stable Diffusion's code and model weights have been released publicly, and it can
Recent changes run on most consumer hardware equipped with a modest GPU. This marked a
Upload file departure from previous proprietary text-to-image models such as DALL-E and

Tools Midjourney which were accessible only via cloud services.[8][9] An image generated by Stable Diffusion based
on the text prompt "a photograph of an
What links here Contents [hide]
astronaut riding a horse"
Related changes
1 Architecture Developer(s) CompVis group LMU
Special pages
2 Usage Munich; Runway; Stability
Permanent link
Page information 2.1 Text to image generation AI[1]

Cite this page 2.2 Image modification Initial release August 22, 2022
Wikidata item 2.2.1 Inpainting and outpainting Stable release 1.5 (model)[2] / August 31,
3 License 2022
Print/export
4 Training Repository github.com/CompVis
Download as PDF 5 Societal impacts /stable-diffusion
Printable version 6 See also Written in Python
7 References Operating system Any that support CUDA
In other projects
8 External links kernels
Wikimedia Commons
Type Text-to-image model
Languages License Creative ML OpenRAIL-M
Architecture [ edit ]
Español Website stability.ai
‫فارسی‬ Stable Diffusion is a form of
Français diffusion model (DM). Introduced in 2015, diffusion models are trained with the
Galego objective of removing successive applications of Gaussian noise to training
日本語
images, and can be thought of as a sequence of denoising autoencoders.
Русский
Stable Diffusion uses a variant known as a "latent diffusion model" (LDM).
中⽂
Edit links Rather than learning to denoise image data (in "pixel space"), an autoencoder
is trained to transform images into a lower-dimensional latent space. The
Diagram of the latent diffusion architecture used process of adding and removing noise is applied to this latent representation,
by Stable Diffusion. with the final denoised output then decoded into pixel space. Each denoising
step is accomplished by a U-Net architecture. The researchers point to
reduced computational requirements for training and generation as an advantage of LDMs.[5][4]

The denoising step may be conditioned on a string of text, an image, or some other data. An encoding of the conditioning data is
exposed to the denoising U-Nets via a cross-attention mechanism. For conditioning on text, a transformer language model was
trained to encode text prompts.[4]

Usage [ edit ]

The Stable Diffusion model supports the ability to generate new images from scratch through the use of a text prompt describing
elements to be included or omitted from the output,[1] and the redrawing of existing images which incorporate new elements
described within a text prompt (a process commonly known as guided image synthesis[10]) through the use of the model's
diffusion-denoising mechanism.[1] In addition, the model also allows the use of prompts to partially alter existing images via
inpainting and outpainting, when used with an appropriate user interface that supports such features, of which numerous different
open source implementations exist.[11]

Stable Diffusion is recommended to be run with 10GB or more VRAM, however users with less VRAM may opt to load the weights
in float16 precision instead of the default float32 to lower VRAM usage.[12]

Text to image generation [ edit ]

The text to image sampling script within Stable Diffusion, known as "txt2img", consumes a text
prompt, in addition to assorted option parameters covering sampling types, output image
dimensions, and seed values, and outputs an image file based on the model's interpretation of
the prompt.[1] Generated images are tagged with an invisible digital watermark to allow users
to identify an image as generated by Stable Diffusion,[1] although this watermark loses its
effectiveness if the image is resized or rotated.[13] The Stable Diffusion model is trained on a
dataset consisting of 512×512 resolution images,[1][14] meaning that txt2img output images are
optimally configured to be generated at 512×512 resolution as well, and deviating from this
size can result in poor quality generation outputs.[12]

Each txt2img generation will involve a specific seed value which affects the output image;
users may opt to randomize the seed in order to explore different generated outputs, or use
the same seed to obtain the same image output as a previously generated image.[12] Users
are also able to adjust the number of inference steps for the sampler; a higher value takes a
longer duration of time, however a smaller value may result in visual defects.[12] Another Demonstration of the effect of
configurable option, the classifier-free guidance scale value, allows the user to adjust how negative prompts on image
closely the output image adheres to the prompt;[15] more experimentative or creative use generation.
Top: No negative prompt
cases may opt for a lower value, while use cases aiming for more specific outputs may use a
Centre: "green trees"
higher value.[12]
Bottom: "round stones, round
Negative prompts are a feature included in some user interface implementations of Stable rocks"

Diffusion which allow the user to specify prompts which the model should avoid during image
generation, for use cases where undesirable image features would otherwise be present within image outputs due to the positive
prompts provided by the user, or due to how the model was originally trained.[11] The use of negative prompts has a highly
statistically significant effect on decreasing the frequency of generating unwanted outputs compared to the use of emphasis
markers, which are another alternative method of adding weight to parts of prompts utilized by some open-source
implementations of Stable Diffusion, where brackets are added to keywords to add or reduce emphasis.[16]

Image modification [ edit ]

Stable Diffusion includes another sampling script, "img2img", which consumes a text prompt, path to an existing image, and
strength value between 0.0 and 1.0, and outputs a new image based on the original image that also features elements provided
within the textual prompt; the strength value denotes the amount of noise added to the output image, with a higher value
producing images with more variation, however may not be semantically consistent with the prompt provided.[1] Image upscaling
is one potential use case of img2img, among others.[1]

Inpainting and outpainting [ edit ]

Additional use-cases for image modification via img2img are offered by numerous different front-end implementations of the
Stable Diffusion model. Inpainting involves selectively modifying a portion of an existing image delineated by a user-provided
mask, which fills the masked space with newly generated content based on the provided prompt.[11] Conversely, outpainting
extends an image beyond its original dimensions, filling the previously empty space with content generated based on the provided
prompt.[11]

License [ edit ]

Unlike models like DALL-E, Stable Diffusion makes its source code available,[17][1] along with pre-trained weights. Its license
prohibits certain use cases, including crime, libel, harassment, doxxing, "exploiting ... minors", giving medical advice,
automatically creating legal obligations, producing legal evidence, and "discriminating against or harming individuals or groups
based on ... social behavior or ... personal or personality characteristics ... [or] legally protected characteristics or categories".[18]
[19]
The user owns the rights to their generated output images, and is free to use them commercially.[20]

Training [ edit ]

Stable Diffusion was trained on pairs of images and captions taken from LAION-5B, a publicly available dataset derived from
Common Crawl data scraped from the web. The dataset was created by LAION, a German non-profit which receives funding from
Stability AI.[14][21] The model was initially trained on a large subset of LAION-5B, with the final rounds of training done on "LAION-
Aesthetics v2 5+", a subset of 600 million captioned images which an AI predicted that humans would give a score of at least 5
out of 10 when asked to rate how much they liked them.[14][22] This final subset also excluded low-resolution images and images
which an AI identified as carrying a watermark.[14] A third-party analysis of the model's training data identified that out of a smaller
subset of 12 million images taken from the original wider dataset used, approximately 47% of the sample size of images came
from 100 different domains, with Pinterest taking up 8.5% of the subset, followed by websites such as WordPress, Blogspot,
Flickr, DeviantArt and Wikimedia Commons.[23][14]

The model was trained using 256 Nvidia A100 GPUs on Amazon Web Services for a total of 150,000 GPU-hours, at a cost of
$600,000.[24][25][26]

Societal impacts [ edit ]

As visual styles and compositions are not subject to copyright, it is often interpreted that users of Stable Diffusion who generate
images of artworks should not be considered to be infringing upon the copyright of visually similar works,[27] however individuals
depicted in generated images may still be protected by personality rights if their likeness is used,[27] and intellectual property such
as recognizable brand logos still remain protected by copyright. Nonetheless, visual artists have expressed concern that
widespread usage of image synthesis software such as Stable Diffusion may eventually lead to human artists, along with
photographers, models, cinematographers and actors, to gradually lose commercial viability against AI-based competitors.[21]

Stable Diffusion is notably more permissive in the types of content users may generate, such as violent or sexually explicit
imagery, in comparison to similar machine learning image synthesis products from other companies.[28] Addressing concerns that
the model may be used for abusive purposes, CEO of Stability AI Emad Mostaque explains that "(it is) peoples' responsibility as
to whether they are ethical, moral, and legal in how they operate this technology",[9] and that putting the capabilities of Stable
Diffusion into the hands of the public would result in the technology providing a net benefit overall, even in spite of the potential
negative consequences.[9] In addition, Mostaque argues that the intention behind the open availability of Stable Diffusion is to end
corporate control and dominance over such technologies, who have previously only developed closed AI systems for image
synthesis.[9][28]

See also [ edit ]

15.ai
Artificial intelligence art
Craiyon
Imagen (Google Brain)
Loab

References [ edit ]

1. ^ a b c d e f g h i j "Stable Diffusion Repository on GitHub" . 15. ^ Ho, Jonathan; Salimans, Tim (July 26, 2022). "Classifier-
CompVis - Machine Vision and Learning Research Group, Free Diffusion Guidance" . arXiv. arXiv.
LMU Munich. 17 September 2022. Retrieved 17 September doi:10.48550/arXiv.2207.12598 .
2022. 16. ^ Johannes Gaessler (September 11, 2022). "Emphasis" .
2. ^ RunwayML. "stable-diffusion-v1-5" . Hugging Face. GitHub.
3. ^ "Diffuse The Rest - a Hugging Face Space by 17. ^ "Stable Diffusion Public Release" . Stability.Ai. Archived
huggingface" . huggingface.co. Archived from the original from the original on 2022-08-30. Retrieved 2022-08-31.
on 2022-09-05. Retrieved 2022-09-05. 18. ^ "Ready or not, mass video deepfakes are coming" . The
abc
4. ^ Rombach; Blattmann; Lorenz; Esser; Ommer (June Washington Post. 2022-08-30. Archived from the original
2022). High-Resolution Image Synthesis with Latent Diffusion on 2022-08-31. Retrieved 2022-08-31.
Models (PDF). International Conference on Computer 19. ^ "License - a Hugging Face Space by CompVis" .
Vision and Pattern Recognition (CVPR). New Orleans, LA. huggingface.co. Archived from the original on 2022-09-04.
pp. 10684–10695. arXiv:2112.10752 . Retrieved 2022-09-05.
5. ^ a b "Stable Diffusion Launch Announcement" . Stability.Ai. 20. ^ Katsuo Ishida (August 26, 2022). "⾔葉で指示した画像を凄
Archived from the original on 2022-09-05. Retrieved いAIが描き出す「Stable Diffusion」 ~画像は商⽤利⽤も可
2022-09-06. 能" . Impress Corporation (in Japanese).
6. ^ "Revolutionizing image generation by AI: Turning text into 21. ^ a b Heikkilä, Melissa (16 September 2022). "This artist is
images" . LMU Munich. Retrieved 17 September 2022. dominating AI-generated art. And he's not happy about it" .
7. ^ Wiggers, Kyle. "Stability AI, the startup behind Stable MIT Technology Review.
Diffusion, raises $101M" . Techcrunch. Retrieved 22. ^ "LAION-Aesthetics | LAION" . laion.ai. Archived from
2022-10-17. the original on 2022-08-26. Retrieved 2022-09-02.
8. ^ "The new killer app: Creating AI art will absolutely crush 23. ^ Alex Ivanovs (September 8, 2022). "Stable Diffusion:
your PC" . PCWorld. Archived from the original on 2022- Tutorials, Resources, and Tools" . Stackdiary.
08-31. Retrieved 2022-08-31. 24. ^ Mostaque, Emad (August 28, 2022). "Cost of
abcd
9. ^ Vincent, James (15 September 2022). "Anyone can construction" . Twitter. Archived from the original on 2022-
use this AI art generator — that's the risk" . The Verge. 09-06. Retrieved 2022-09-06.
10. ^ Meng, Chenlin; He, Yutong; Song, Yang; Song, Jiaming; 25. ^ "Stable Diffusion v1-4 Model Card" . huggingface.co.
Wu, Jiajun; Zhu, Jun-Yan; Ermon, Stefano (August 2, 2021). Retrieved 2022-09-20.
"SDEdit: Guided Image Synthesis and Editing with Stochastic 26. ^ "This startup is setting a DALL-E 2-like AI free,
Differential Equations" . arXiv. arXiv. consequences be damned" . TechCrunch. Retrieved
doi:10.48550/arXiv.2108.01073 . 2022-09-20.
abcd
11. ^ "Stable Diffusion web UI" . GitHub. 27. ^ a b "⾼性能画像⽣成AI「Stable Diffusion」無料リリース。
12. ^ a b c d e "Stable Diffusion with Diffusers" . Hugging 「kawaii」までも理解し創造する画像⽣成AI" . Automaton
Face official blog. August 22, 2022. Media (in Japanese). August 24, 2022.
13. ^ "invisible-watermark README.md" . GitHub. 28. ^ a b Ryo Shimizu (August 26, 2022). "Midjourneyを超えた?
14. ^ a b c d e Baio, Andy (30 August 2022). "Exploring 12 Million 無料の作画AI「 #StableDiffusion 」が「AIを⺠主化した」と断⾔
of the 2.3 Billion Images Used to Train Stable Diffusion's できる理由" . Business Insider Japan (in Japanese).
Image Generator" . Waxy.org.

External links [ edit ]

Stable Diffusion Demo


Wikimedia Commons has media
related to Stable Diffusion.

Categories: Deep learning software applications Open-source artificial intelligence Text-to-image generation

This page was last edited on 22 October 2022, at 05:25 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License 3.0; additional terms may apply. By using this site, you agree to the Terms
of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.

Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement

You might also like