❤️ Become The AI Epiphany Patreon ❤️ ►
In this video I cover DALL-E or “Zero-Shot Text-to-Image Generation” paper by OpenAI team.
They train a VQ-VAE to learn compressed image representations and then they train an autoregressive transformer on top of that discrete latent space and BPEd text.
The model learns to combine distinct concepts in a plausible way, image to image capabilities emerge, etc.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
✅ Paper:
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⌚️ Timetable:
00:00 What is DALL-E?
03:25 VQ-VAE blur problems
05:15 transformers, transformers, transformers!
07:10 Stage 1 and Stage 2 explained
07:30 Stage 1 VQ-VAE recap
10:00 Stage 2 autoregressive transformer
10:45 Some notes on ELBO
13:05 VQ-VAE modifications
17:20 Stage 2 in-depth
23:00 Results
24:25 Engineering, engineering,…
Create Content Fast with AI: read more about Frase
Artificial intelligence makes it fast & easy to create content for your blog, social media, website, and more! Jasper is the AI Content Platform that helps you and your team break through creative blocks to create amazing, original content 10X faster.
Awesome videos!!!
Than you! I have a one misunderstanding: the text token embedding size (256) is smaller than the image token embed size (8192). How can we combine them to pass through the transformer? I always think that embedding size shoud be same.
In the video, you talk about VQ-VAE, but the paper mentions dVAE. Are those similar concepts, or is there a difference between them?
Can we also have a code explained video for this plz
Do check out the newly published DALL·E mini!
Live Demo: https://huggingface.co/spaces/flax-community/dalle-mini
Technical Report: https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini–Vmlldzo4NjIxODA
very well explained!
Are they planning to release the code for this? The github they have up doesn't allow text input (not sure what the point of even having it without any form of input is). I'd like to try this for myself.
Awesome like always!
VQgan + clip is better than dalle and its strapped together very easily
Great video, love the content.
Great!! thank you so much!!
Look forward to watching this, great paper to tackle!