Text-to-image model  

From The Art and Popular Culture Encyclopedia

Jump to: navigation, search

Related e

Wikipedia
Wiktionary
Shop


Featured:

A text-to-image model is a machine learning model which takes as input a natural language description and produces an image matching that description. Such models began to be developed in the mid-2010s, as a result of advances in deep neural networks. In 2022, the output of state of the art text-to-image models, such as OpenAI's DALL-E 2, Google Brain's Imagen and StabilityAI's Stable Diffusion began to approach the quality of real photographs and human-drawn art.

Text-to-image models generally combine a language model, which transforms the input text into a latent representation, and a generative image model, which produces an image conditioned on that representation. The most effective models have generally been trained on massive amounts of image and text data scraped from the web.Template:R

See also




Unless indicated otherwise, the text in this article is either based on Wikipedia article "Text-to-image model" or another language Wikipedia page thereof used under the terms of the GNU Free Documentation License; or on research by Jahsonic and friends. See Art and Popular Culture's copyright notice.

Personal tools