With our Stable Diffusion (SD) cloud image created in cooperation with AI-SP you can instantly render stunning Stable Diffusion images independently on your own Cloud server with great performance. Stable Diffusion is a machine learning model developed by StabilityAI, in collaboration with EleutherAI and LAION, to generate digital images from natural language descriptions.
The neural network model can be used for other tasks too, like generating image-to-image translations guided by a text prompt. Here you can find more background about What is Stable Diffusion. Please login to Hugging Face hosting Stable Diffusion and request access to stable-diffusion-v1-4 to comply with license and access regulation.
Overview of Features (Detailed overview of features including examples)
- Original txt2img and img2img modes
- One click install and run script (but you still must install python and git)
- Prompt Matrix
- Stable Diffusion Upscale
- Attention, specify parts of text that the model should pay more attention to
- a man in a ((tuxedo)) – will pay more attention to tuxedo
- a man in a (tuxedo:1.21) – alternative syntax
- Loopback, run img2img processing multiple times
- X/Y plot, a way to draw a 2 dimensional plot of images with different parameters
- Textual Inversion
- have as many embeddings as you want and use any names you like for them
- use multiple embeddings with different numbers of vectors per token
- works with half precision floating point numbers
- Extras tab with:
- GFPGAN, neural network that fixes faces
- CodeFormer, face restoration tool as an alternative to GFPGAN
- RealESRGAN, neural network upscaler
- ESRGAN, neural network upscaler with a lot of third party models
- SwinIR, neural network upscaler
- LDSR, Latent diffusion super resolution upscaling
- Resizing aspect ratio options
- Sampling method selection
- Interrupt processing at any time
- 4GB video card support (also reports of 2GB working)
- Correct seeds for batches
- Prompt length validationget length of prompt in tokens as you type
- get a warning after generation if some text was truncated
- Generation parametersparameters you used to generate images are saved with that image
- in PNG chunks for PNG, in EXIF for JPEG
- can drag the image to PNG info tab to restore generation parameters and automatically copy them into UI
- can be disabled in settings
- Settings page
- Running arbitrary python code from UI (must run with –allow-code to enable)
- Mouseover hints for most UI elements
- Possible to change defaults/mix/max/step values for UI elements via text config
- Random artist button
- Tiling support, a checkbox to create images that can be tiled like textures
- Progress bar and live image generation preview
- Negative prompt, an extra text field that allows you to list what you don’t want to see in generated image
- Styles, a way to save part of prompt and easily apply them via dropdown later
- Variations, a way to generate same image but with tiny differences
- Seed resizing, a way to generate same image but at slightly different resolution
- CLIP interrogator, a button that tries to guess prompt from an image
- Prompt Editing, a way to change prompt mid-generation, say to start making a watermelon and switch to anime girl midway
- Batch Processing, process a group of files using img2img
- Img2img Alternative
- Highres Fix, a convenience option to produce high resolution pictures in one click without usual distortions
- Reloading checkpoints on the fly
- Checkpoint Merger, a tab that allows you to merge two checkpoints into one
- Custom scripts with many extensions from community
Check out the Detailed overview of features including examples for more background and how to.
1. Start and Access the AWS Stable Diffusion GPU cloud server
In the AWS Marketplace offering please subscribe to the Stable Diffusion Cloud Server on the AWS Marketplace and start the instance on an AWS GPU server (actually the Stable Diffusion neural network was trained on AWS GPU g5 instances with nVidia A100). The default server type is g4dn.xlarge which will give you a nVidia T4 GPU with 16 GB GPU memory and 16 GB system memory which is a great platform to start with Stable Diffussion.
For the remote desktop connection we will use the NICE DCV high-end remote desktop software and login to our cloud server with the following steps (please also see our video: NICE DCV Remote 3D on AWS – NI SP DCV AMI):
- Connect to the Stable Diffusion server:
- Open a web browser window at
https://PUBLIC_DNS_NAME:8443 -or- https://EXTERNAL_IP_ADDRESS:8443
replacing the capital part with the address of your Stable Diffusion server
—– OR —-
- Download the NICE DCV client for your OS from https://download.nice-dcv.com/. E.g. for Windows download: DCV Windows Client
- Open the DCV client and enter the “public DNS name” of your server or “IP address”.
- Open a web browser window at
Accept the security warning related to the dynamically created certificate (the connection is secure) clicking “Trust and connect” and login with user Administrator and the password retrieved from the AWS “Connect” information (please follow steps 1-5 at the AWS Windows login guide to retrieve the password):
2. How to Run the Stable Diffusion GUI in Your Server
When you log into your Stable Diffusion Cloud Server you will see the following icons on your desktop:
On the desktop you can find:
- Start command “SD – START” to start Stable Diffusion. Double-Clicking will open a command shell which will load the Stable Diffusion model and start the webserver at http://localhost:7860.
- Stable Diffusion user interface “SD – GUI” created by Automatic. Double-Clicking will open the web browser firefox with the SD GUI.
- Link to the libraire.ai prompt library to get ideas for new prompts (next to it in the upper middle)
- 2 directories with examples of images which have been created with text2image (txt2img-images – output) and img2img (txt2img-images – output)
How to start the Stable Diffusion GUI
- Double-click on the SD – START icon and wait a couple of minutes until the command windows looks similar to below after loading the gigabytes of model data:
- Wait for the SD – GUI to automatically open in the firefox browser after loading has completed or double-click “SD – GUI” and the Stable Diffusion user interface will open in the firefox web browser:
3. Stable Diffusion: Text to Image
The Stable Diffusion GUI offers a number of controls including tooltips when hovering over the controls:
- Prompt: Here you enter the description of the image to be created by the Stable Diffusion neural network. Get ideas for prompts from the last chapter on this page or e.g. libraire.ai or lexica.art
- Sampling Steps: how many iterations Stable Diffusion shall perform. Between 30-50 are a good start
- Batch count: every image will be created with a different seed. More images take longer to render
- Creativeness/CFG Scale: How much should Stable Diffusion follow your prompt. Higher will try to match you prompt better but can become chaotic regards the images created
- Seed: -1 is for random seeds. Same seed will create the same image in case resolution and sampler are identical
- Resolution: resolutions up to 512 x 512 can be rendered well on the 16 GB T4 GPU in g4 instances. Higher resolutions might lead to errors. Upscaling is available at factor 2 or 4
- Sampling method: how the image is retrieved from the neural network
- Tiling: Generate seamless tiled images
- Restore faces: Use additional steps leveraging GFPGAN
Enter your prompt in the prompt box and click “Generate” to the right. You can monitor the resource usage with Windows Task Manager.
Depending on the size of the image and steps the rendering will take a few seconds (the first generation will load the data into memory and GPU and will take a couple of minutes to finish). The generated image/s will be shown after rendering. The progress will be shown in the command window. You can also enable a progress bar in the GUI in the Settings.
Generated images are automatically stored in the linked directories on the desktop. You can download images via the DCV built-in download functionality located at the upper left of the DCV window from the Desktop (please copy the desired image to the Desktop first). In case you want to retrieve the prompt used for a previous image you can check the logfiles generated.
4. Stable Diffussion: Image to Image
Clicking the “img2img” tab in the upper tab row starts the “Image to Image” mode.
Select an image e.g. from of the folders on the desktop and configure the “Denoising Strength” to control how much the start image controls the output. Higher values lead to stronger influence of the initial image.
Click “Generate” and the new image will be rendered, stored into “img2img-images – output” on the Desktop and displayed as follows:
Initial image in the above example was:
The GUI offers a number scripts for additional functionality:
The “Loopback” script e.g. offers to automatically create an images based on the present start image and use the latest image generated as next image “moving forward” from image to image automatically. Also the “text2image” tab offers additional scripts.
The “Extras” tab supports upscaling and other image manipulation. The “PNG Info” tab allows to retrieve information stored in the PNG image file like prompt and other paramaters.
Check out the “Settings” tab to control where the images are stored and control other parameters.
5. Stable Diffusion License, Tips and Tricks
More information on Stable Diffusion from the Stable Diffusion github page:
Stable Diffusion is a latent text-to-image diffusion model. Thanks to a generous compute donation from Stability AI and support from LAION, we were able to train a Latent Diffusion Model on 512×512 images from a subset of the LAION-5B database. Similar to Google’s Imagen, this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See this section below and the model card.
The weights are available via the CompVis organization at Hugging Face under a license which contains specific use-based restrictions to prevent misuse and harm as informed by the model card, but otherwise remains permissive. While commercial use is permitted under the terms of the license, we do not recommend using the provided weights for services or products without additional safety mechanisms and considerations, since there are known limitations and biases of the weights, and research on safe and ethical deployment of general text-to-image models is an ongoing effort. The weights are research artifacts and should be treated as such.
The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.
You can use the built-in file up/download functionality of the remote desktop DCV software to e.g. download rendered images. The button is located to the upper left in the DCV window.
Stable Diffusion Feature Guide with Examples
The Stable Diffusion Automatic WIKI has a Detailed overview of features including examples.
Negative prompt is a way to use the Stable Diffusion in a way that allows the user to specify what he doesn’t want to see, without any extra load or requirements for the model. The feature has found extreme popularity among users who remove the usual deformities of Stable Diffusion like extra limbs with it. In addition to just being able to specify what you don’t want to see, which sometimes is possible via usual prompt, and sometimes isn’t, this allows you to do that without using any of your allowance of 75 tokens the prompt consists of.
If you want to create beautiful persons you can try use this negative prompt:
((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck)))
You may weight different sections of the prompt to tell the sampler to attach different levels of priority to them, by adding :(number) to the end of the section you wish to up- or downweight. For example consider this prompt:
tabby cat:0.25 white duck:0.75 hybrid
This will tell the sampler to invest 25% of its effort on the tabby cat aspect of the image and 75% on the white duck aspect (surprisingly, this example actually works). The prompt weights can use any combination of integers and floating point numbers, and they do not need to add up to 1.
Separate multiple prompts using the
| character, and the system will produce an image for every combination of them. For example, if you use
a busy city street in a modern city|illustration|cinematic lighting prompt, there are four combinations possible (first part of prompt is always kept):
a busy city street in a modern city
a busy city street in a modern city, illustration
a busy city street in a modern city, cinematic lighting
a busy city street in a modern city, illustration, cinematic lighting
Working with image 2 image (from reddit comment)
Take an output of txt2img that I like, pop it into GIMP, and do things like shop out extra arms, legs, fingers, and use clone tool to smooth out anything that looks weird. This is just to get a crude foundation to get img2img going.
Then I run img2img like 10 times with varying low strength values, like 0.2, 0.25, etc. up to 0.5. Strength is the key flag in img2img because it is the “creative liberty” knob for SD. Lots of the outputs look like crap but usually there is one or two that didn’t change the image too much and got it closer to what you’re going for.
So then you pick the best of those and pop it into GIMP again. Now you can overlay the best parts from the original image over the top of the second one and repeat the process. I do that until it’s good quality or I get bored, and then run the result through upscaler to make it bigger (I might do upscale with GFPGAN first).
6. Example prompts for Stable Diffusion
- A well-preserved library among hidden ruins, matte painting, trending on artstation
- A highly realistic, true to life portrait of a young woman, by karol bak, james jean, tom bagshaw, rococo, sharp focus, trending on artstation, cinematic lighting, hyper realism, octane render, 8 k, hyper detailed
- A distant futuristic city full of tall buildings inside a huge transparent glass dome, In the middle of a barren desert full of large dunes, Sun rays, Artstation, Dark sky full of stars with a shiny sun, Massive scale, Fog, Highly detailed, Cinematic, Colorful
- haunted house, trees in the distance, tree leaves on the ground, mischievous and gorgeous, vibrantcolors, awardwinning, intricate, insanely detailed, digitalpainting, conceptart, horrorvibes
- Ultra realistic photo, red hair girl working in a bar, beautiful face, intricate, highly detailed, smooth, sharp focus, art by artgerm and greg rutkowski and alphonse mucha — (try replacing red hair with ginger, fire hair, …. )
- A place in wales, tucked out of view magic happens, only seen by a few. for just one day, for only just one hour. The last summer’s day break at Gelli aur. there you must follow a winding trout stream. search all the oaks with a tiny light beam, inspired by ( greg rutkowski ) and charlie bowater
- Asian Brad Pitt, high quality, trending on Artstation
- A beautiful view of hogwarts school of witchcraft and wizardry and the great lake, concept art, by Thomas Kinkade, architecture, atmospheric, sense of awe and scale, artstation HQ
- Ice goddess with beautiful face with a glowing blue crystal on her forehead, frosty white eyes, winter mist around her, white plated armor, pale skin, white smoke:: photorealism, octane render, frostbite, 8k, cinematic, 35mm
- A levitating and floating beautiful detailed haussmannian palace villa by Neil Blevins and Gilles Beloeil, M C Escher and Lee Madgwick over a lake by Cyril Rolando, colorful, geometric, Vray, illuminated windows, Transcended Beyond Physics, Gravitational Anomaly, Detailed Realistic, Detailed Digital Painting, vibrantcolors, 5-Dimensional, Assassin’s Creed, Color Grading, Ektachrome
- Blonde-haired beautiful Warrior Queen, in fantasy armor, with Iron crown, cross symbolism, with a fit body, dark forest background, hopeful light, photorealistic, painted by artgerm, Akihiko yoshida, sakimichan, krenz cushart, low angle shot, digital painting
- Ultra realistic photo, princess peach in the mushroom kingdom, beautiful face, intricate, highly detailed, smooth, sharp focus, art by artgerm and greg rutkowski and alphonse mucha
- portrait of a woman made of cracked marble. high contrast. macro photography
- A beautiful neon cyberpunk city street at night, apartment, skyscrapers, by alphonse mucha caravaggio monet ,4K resolution, 8K resolution, a lot of Decoration and embellishments, sci-fi, photorealistic, highly detailed, sharp focus, clean 8k, volumetric lighting, octane render, ceramic
- A modern teenage girl’s bedroom, indirect sunlight, high detail, lush decor, realistic, photorealistic, 8k
- Ana de Armas as Red Sonja, portrait, detailed features, intricate, highly detailed, sharp focus
- (painting of girl from behind looking a fleet of imperial ships in the sky, in a meadow of flowers. ) by donato giancola and Eddie Mendoza, elegant, dynamic lighting, beautiful, poster, trending on artstation, poster, anato finnstark, wallpaper, 4 k, award winning, digital art, imperial colors, fascinate view
- A painting of a beautiful woman in the middle of a city street by Daniel F Gerhartz, William Adolphe Bouguereau, John William Waterhouse, and Thomas Kinkade. City skyline. Chiaroscuro. Volumetric Lighting
Other resources for prompts: