Takeaways and Learning from 3 days of using MidJourney

Eve Weinberg
6 min readDec 12, 2023
100 stunning high-res images produced in 2 days. sheesh!

I’m in my third week of a great ‘Intro to AI product design’ course (which I highly recommend). The assignment this week was a fun one, so thought I’d share it more publicly on Medium.

The assignment: Experiment with any open, free to access LLM, Foundation Models or Generative AI models. Share your experimentation. Experiment with these tools and try do accomplish something unique. What other value could this tool provide? Here are a few fun examples: https://medium.com/seeds-for-the-future/creative-ways-people-used-midjourney-ai-and-you-can-too-b2d920e34ec0; https://medium.com/illumination/14-creative-ways-to-use-chatgpt-you-probably-didnt-know-about-f0f5d2608c6d

Takeaways

I’ve been playing with MidJourney for months now, but this was my first time really trying to make something specific. My conclusions (based on the technology as of Dec 2023) are these:

  1. High-end image generation has become completely democratized. Anyone can generate incredible art via text input using MidJourney!
  2. The learning curve, from 0 to 50 let’s say, is basically nothing. Within 2 days, I was able to research enough prompt engineering terms to generate gorgeous images with some amount of control.
  3. Executing on something specific still requires manipulation in a tool, like Figma, Adobe Illustrator, Blender, Cinema 4D, Maya, Photoshop, etc. If you’re a professional working for a client that wants something specific, you’ll still need these skills.

What I tried to accomplish

I just started working at Modular, who just announced their programming language Mojo, that is often represented with an emoji flame logo.

Modular is already having a lot of fun with this flame logo. Check out what they did at ModCon! And, yes, that is my future boss dancing on stage 🤣🕺.

So I wanted to create a playful 3D emoji version of this fireball, dancing, smiling, ideally even working at a computer or presenting to an audience on stage. In the modular discord, I found this image that really captured the image quality I wanted to try. I loved it’s playfulness, simplicity, and 3d-ness.

Reference image that I used in several prompts

Prompt variables

My process was all about experimentation and just trying out the different ‘features’ of MidJourney. Because MidJourney really does split out a different image every time, even if you give it the same prompt, it’s hard to be too scientific about the process, so here are some of the variables I experimented with:

1. Including a reference image or not
2. Adjectives (Ex: cute, kawaii)
3. Descriptive rendering words (Pixar, Cinema4d, clay, 3d)
4. Nouns (Ex: fire vs fireball)
5. Levels of being explicit (Ex: ‘big smile’ vs ‘happy’, or ‘has arms and legs’ vs not)
6. Iterations — how many times to click V1, or U1 then click ’slight variation’
7. Parameters (learn more here) — there are so many parameters to try!
MidJourney is impossible to control or replicate the same image twice

My experiments

  1. I started simple. If you don’t add much to your prompt, MidJourney will default to it’s own style, so here’s what that looked like.

This was actually really spot on in terms of form factor, but not style. Not cute enough!

Next I tried adding a few more words, and then a process of fine tuning with the lasso ‘vary region’ option:

Next I tried a ref image of the emoji and got something so freaking cool, yet SO SO off-brand and weird and unexpected. It wasn’t useful for my project, but it definitely inspired me!

I mean, how freaking cool is this?!

Next I tried to using stronger adjectives ‘cutest’ ‘cuddliest’ ‘kawaii’ And explored ‘V1’ and ‘slight variation’ to see if I could have any amount of control over a fine tuning process. Ultimately, I felt very out of control and felt like MidJourney was not predictable enough.

Variations

I tried the same prompt multiple times throughout the day. Here are 2 different responses to this same prompt:

Next I decided to use my reference image, and tried this, which ultimately I think was the closest I was able to get to the vision that was in my head:

I tried a parameter called — niji 5, which pushes the render towards anime.

Then I tried adding rendering style suggestions to the prompt. Words like ‘Pixar trend, and ‘Animation lighting’ helped a bit.

The final prompt I tried combined all of the variables into 1, where I added lots of description, my best-tested nouns, lots of rendering suggestions, and a reference image.

I did an alt of this, where I used the :: weight syntax and it helped a lot! You can see in the prompt below that I added a weight of 5 to the ‘presenting to a large human audience’ and a weight of 2 to ‘wide shot of lots of audience members.’ I think this helped to prioritize the environment. Here’s the final prompt:

Zoomed out

What I would change

Similar to the takeaways at the top, I ultimately felt like the tool’s capabilities are incredible, but un-controllable. While it generates impressively rendered images incredible fast, as a designer and art director, I can’t rely on any ability to control or manipulate it in a repeatable fashion.

What I would want to change if I could about the UX of this tool is the ability to have a critique session per image, in a conversational way. Let’s take this last image for example. It’s ok, but not perfect.

A more ideal MidJourney UX

What if I could select certain areas of the image to discuss and then give feedback on those areas via text, or add reference images regarding that annotated area, and then go rounds of fine tuning based on my art direction. That would be amazing.

There’s something called a style tuner, but it doesn’t accomplish this level of control: https://docs.midjourney.com/docs/style-tuner

Potentially other tools like stable diffusion or runway do this better, but I wasn’t able to get those up and running yet.

What else could MidJourney be used for?

While this open-ended image creation value is nice, what real ‘Jobs to be done’ could it solve for? Here are a few ideas.

  1. Storyboarding — Considering how great MidJourney is at image generation, i think it’d be interesting to generate storyboards. One could feed it a script and it could break it into frames.
  2. Animation — similar to what RunwayML can do, a user could feed it a text prompt or image and it could generate a moving image. Animation and motion graphics explain concepts with far more clarity than 1 single image.
  3. Diagraming– A user could explain a diagram and MidJourney could create the right diagram or flowchart or graph. Chat GPT4 has a few features/plugins that I think do this well but I haven’t explored them yet.

--

--

Eve Weinberg

Present Futurist. Design Manager at Modular. Formerly at Frog design & HackerOne. 🎓: NYU’s ITP & WashU.