A Stable Diffusion Img2img Tutorial
by Ascendant Stoic
A belated tutorial that | promised a while back but kept delaying due
to several circumstances (including an annoying cold), and also
developments in A.I technology which is developing so fast that it
made the old tutorial | was making obsolete before | could finish it,
guess it’s the price to pay for being at the cutting edge of this
revolutionary tech which is destined to change not just the art world
but also many other fields like physics, biology and architecture.
So without further delay let’s start, the key component to this
technique is the new Anything V-3.0 model which is excellent at
anime style art (with a bit of semi-realism depending on how you
use it) and works great for the image to image workflow, and note
that if the stylistic change is too strong there are ways to tone it
down and keep a more balanced mix of styles.
-Here are the things you will need:-
1-Automatic1111 WebUI or InvokeAl 2.2 local install (or use Stable
Horde which is a little slow but is online and free and has the
Anything V-3.0 model)
2-In case of local install you need to download the model from
Hugging Face (download the safe-tensors version which is less open
to malicious manipulation)
3-Obviously you will also need an Nvidia card (preferably Geforce
RTX 20XX or 30XX or better, lower cards might or might not work)
4-You will also need one of the three things :-An old art piece A quick sketch Old_A.l_ generations
Put any of them in the img2img tab and set (Denoising to
0.5-0.3 / Steps to 20-60 / Sampler to Euler-A / CFG 50%)
Create a prompt that describes input image details and add
any enhancements you like + a typical negative prompt-Few things to take into account:-
1-You might not get the results you want from the first time, so
adjust the Denoising value (usually start high and lower it
gradually lower it from 0.60 to 0.30) to test how well the A.I
understands your inputs, also adjust the prompt by adding or
removing words and adding brackets around words for emphasis
((insert word)) so A.I gives those words more attention.
2-Generate 2 to 4 variants each time, they can be used in many
ways like cutting and pasting details from a variant into another
using photobashing techniques.
3-If needed open the generated image in any image editor and
fix issues with eyes/hands/details and adjusting stuff like contrast
/gamma/..etc.
4-You can use a variant you like (specially after editing it) and
put it back into img2img to get variants of it that are, well
blended together and coherent, and if you want to enhance the
resolution and details of that variant you like you can use SD
Upscale for that (more on that later).
Prompt Example (Old Art Piece):-
aring a crop top and carrying
eC MOC Mele Mme Meuse
y highly detailed, 4K
resolution, perspective, vanishing point, depth of field, bokeh,
masterpiece, trending on Artstation"
Negative Prompt Example:-(SD Upscale)
Once you get a variant you like you can use SD Upscale from the
scripts drop menu in img2img.
Set Width x Height to 512 x 512 (because SD Upscale cuts image
into smaller pieces and passes them into img2img and then stitches
them back together, so these dimensions are for the pieces not the
final image, which gets scaled by a x2 factor by default)
Choose an upscaler model from the ones avalible or download a
new one from upscale.wiki/wiki/Model_Database, my fav is one called
“Foolhardy Remacri"
Set Denoising value to around 0.25 and 0.15 and leave CFG as it
is (50%) and reduce the prompt to only enchancement parts.
Set number of generations to 2 because SD Upscale generates
completely new details which weren’t in the original images it also
upscales, so having to or more variant to pic from is a good