# Architecture

Stable Diffusion uses a kind of diffusion model (DM), called a latent diffusion model (LDM).\[1] Introduced in 2015, diffusion models are trained with the objective of removing successive applications of Gaussian noise on training images which can be thought of as a sequence of denoising autoencoders.&#x20;

<figure><img src="https://66508924-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJaRV9HsOrKmAQxWrp4OQ%2Fuploads%2Fwylsdf7LxzwyPxal9TMP%2FStable_Diffusion_architecture.png?alt=media&#x26;token=54a170d9-beb7-43ea-a70e-a8b1e443b726" alt=""><figcaption><p>Diagram of the latent diffusion architecture used by Stable Diffusion</p></figcaption></figure>

Stable Diffusion consists of 3 parts: the variational autoencoder (VAE), U-Net, and an optional text encoder.\[11] The VAE encoder compresses the image from pixel space to a smaller dimensional latent space, capturing a more fundamental semantic meaning of the image.\[12] Gaussian noise is iteratively applied to the compressed latent representation during forward diffusion.\[11] The U-Net block, composed of a ResNet backbone, denoises the output from forward diffusion backwards to obtain a latent representation.&#x20;

Finally, the VAE decoder generates the final image by converting the representation back into pixel space.\[11] The denoising step can be flexibly conditioned on a string of text, an image, or another modality. The encoded conditioning data is exposed to denoising U-Nets via a cross-attention mechanism.\[11] For conditioning on text, the fixed, pretrained CLIP ViT-L/14 text encoder is used to transform text prompts to an embedding space.\[1] Researchers point to increased computational efficiency for training and generation as an advantage of LDMs.\[13]\[14]


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://stablediffusion.gitbook.io/overview/stable-diffusion-overview/technology/architecture.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
