A month ago, Nathan Baschez’s piece on building Lex, a new word processor with AI, triggered a brain-exploding, face-melting, eureka moment 🤯.
Scouring the web and exchanging messages with some of my founder and product friends surfaced a number of posts from investors like Elad Gil, NFX, and Sequoia (some of which I posted on LinkedIn).
One of those friends Robert Bajor chimed in:
“Sometimes, the ultimate test for determining the magic of a technology is that fleeting first impression, when you or someone you’re showing the technology goes in cold. I remember the first time I played with an iPhone and I knew it was a big deal; no one had to tell me. I’ve created a series of use cases in OpenAI and every 🤖 Single 🤖 Time. Minds are blown 🤯”
I think he captured it perfectly — the magic feeling you get from playing with this technology aligns with what I’ve heard many others say.
GPT-2, DALLE? Those felt like they had promise, but for now felt like a cool party trick.
GPT3, DALLE2, et al? Now, it’s like whoa, they can do WHAT?!
To really understand what was possible, I began playing with the tools and undertaking building/learning sprints with some of those friends.
Some Resources To Help You Get Started
FIRST: One of the first resources I found most useful for onboarding rapidly into this space was this series of posts from fellow Substack writer Jon Stokes (former Ars Technica cofounder):
- Part 1 Machine Learning Basics
- Part 2 AI Content Generation: Tasks
- Part 3 On Getting Started With Stable Diffusion (for image generation)
- Part 4 AI Content Generation: What’s Next → this is fun to explore alongside the VC posts for different people’s frameworks for how this ecosystem will evolve and the idea spaces to play in as founders and product builders
If you want to just get into building, go to Part 3 where he talks about getting started with Stable Diffusion (both online via playground.ai and locally on a Mac using Diffusion Bee).
Reading Part 1 triggered a big mental model unlock, helping shift my intuition for thinking of generation not as creation, but as searching within a latent space based on how ML models work.
SECOND: The second resource that helped me significantly was this Tweet thread from Mr. Pizza Later:
And then especially, the descriptive phrases from later in this Tweet thread:
Some Initial Insights
Nothing will help you more than playing with the technology. Here are some examples and insights I learned from using Playground AI (the UI/platform/online tool) and Stable Diffusion (an underlying image generation model that was created and made open source by Stability AI):
1. Getting started by fooling around with silly prompts
My partner and I started off playing with an initial loose prompt of:
“pig riding skateboard in an amusement park”
The biggest insights:
- We were not too strict with our prompt or seed yet, so we did vary the text on the core subject somewhat (e.g., amusement vs. theme, sometimes we added the words ‘in the sky’), as well as some of the descriptors.
- In other words: we had no idea what we were doing.
- Funny prompts that mash up nontypical things don’t always work as well (at least at first).
- While a local setup is free monetarily, it takes a while for images to render.
2. Some Prompts (Seem) to Work Well Naturally
Then, I decided to go with a more literal prompt of:
“a lush landscape with a lake and mountains with the northern lights and the universe shining overhead”
The biggest insights:
- Again, we were not super strict with the prompt or the seed, and maybe we just got lucky but….
- Scenery based prompts, with sufficient detail, seem to work really well.
- These were also generated via local setup. But at this point, the time limitation felt like it was slowing down my progress on honing my intuition and understanding for how to maximize using this tool. So, I switched to using Playground AI.
3. Once You Find A Prompt/Image Result You Like, The Seed Gives You Control
I wanted to begin seeing for myself how the seed works, so I started with a prompt around a specific mashup and random seeds. I came up with another prompt I could also visualize and had a sense for how this could look:
“cell cutaway as a futuristic city and organelle shaped buildings”
The biggest insights:
- I was much more careful about keeping the core prompt the same.
- After finding a base image I liked, I checked the seed, and then fixed the seed and kept it constant.
- I started altering the descriptors after the subject of the prompt above, whether that was blueprint, marble sculptures, or even Star Wars (yes, I thought that would be fun, can you guess which of the variations was this one?).
- In this case, the mashup of concepts and description therein seemed to do well within the latent space as most of the images that I generated even before finding one I felt partial to were quite strong (that is, I’m not sure if the subject is better covered by the data and model, or if the prompt was better).
4. Patience Is Important Because So Far As I Can Tell, Seeds Work Randomly
I wanted to test mashing up more literal objects again, and I went with something I’m partial to and I’d seen some really fun similar creations out in the wild wild web:
“nike sneaker made of fire”
Side note: In Stable Diffusion, the seed is a random number from 0 to 4,294,967,295, or in other words 2 ^ 32–1.
The biggest insights:
- It can require patience. So far, the seed feels random and there’s no method to the madness (this from trying to adjust seeds following power laws, larger increments from 0 to 2 ^ 32–1, as well as 1 by one).
- Again, once I found a few images from a specific core prompt and seed combo, I kept those constant while changing the descriptors to generate variations.
4a. Some base prompt subject + seed with varying descriptions produce low variance.
(And yes, I did do 60 different variations with the same base prompt and seed)
This particular result from the core prompt and seed seemed to have a very strong bias from the search results within latent space, and didn’t change very much despite altering the descriptive phrases and styles significantly across photography, illustration, painting/artist and 3D styles.
4b. Some base prompt subject + seed with varying descriptions produce better higher variance.
This particular result from the core prompt and seed actually experienced wider variation (in this case, I’d say more accurately) reflecting the various painting/artist and other style descriptors that I tried.
Again, it all depends on your goal, and I’d highly encourage you to go play around with the tools yourself.
But I’d wager, no matter what you find,
I’m continuing to build and learn, with an eye toward applications in the future of education and work, as well as tech more broadly.
If interested, feel free to reach out!
🧰🔨 Some additional tools and resources:
- Playground.ai: allows you to play with DALLE2 and Stable Diffusion
- Dream Studio is Stability AI’s own tool for playing with the model it created, Stable Diffusion
- Disco Diffusion
- DALLE (what is open and available to the public to play with from OpenAI, for now) → you can play with DALLE2 via Playground.ai though and other tools, for instance)
- Just discovered this post comparing Stable Diffusion with Disco Diffusion, takeaways at the bottom are very helpful.
For prompt inspirations,