Tame Shapes™ - Story

Background

I've spent some time in the past years learning about AI and experimenting with different AI models, from image to text and sound. One of the areas I've been most interested in is what it could do for creative fields. It's not just about the ability to produce human-like images, text, or audio, but the possibility of a new kind of computer that enables entirely new ways of working creatively.

Early in my design studies, I got in touch with GPT-2, a language model that was able to produce text that sounded correct but didn't have a lot of meaning if you looked closer. Because I followed those developments closely and signed up for the waitlist, I got early access to GPT-3, a predecessor of ChatGPT, which I used as a centerpiece of my graduation project. Compared to GPT-2, you could suddenly use it as a chatbot or write whole pages of text that kind of made sense. It was even able to do basic coding.

Coming from engineering and physical product design, I wanted to dedicate my graduation project to developing a physical form, an embodiment for GPT-3 that you could talk to. The goal for the whole conversation was that you might get inspired because you reflect on your thoughts, or because the AI asks you questions that made you think. I found that human-machine relationship quite interesting, where the AI has certain skills that you might not have (like really broad general knowledge), while you have deep knowledge in your field and a connection to the real world. That kind of symbiotic relationship really stuck with me, even though back then the conversations with GPT-3 were kind of random and often drifted away. Nevertheless, that's what guided me in future projects to find use cases where the AI could widen your imagination to benefit your work instead of doing the work for you.

Idea

I was working on some AI ideas, and when GPT-4 came out in March 2023, one of the major changes from GPT-3 was that it was not just trained on language but also on images. In a paper from Microsoft Research, they had a section where they tested the visual understanding and intelligence of the AI. They gave it certain tasks to solve visually. Because GPT-4 can't generate images directly like Midjourney or DALL-E, they used a workaround by letting it express through code, which then gets translated into visuals.

The designs from the paper were quite basic, and their purpose was mostly to match the prompt as closely as possible. However, I thought it might be more interesting to explore new shapes or compositions the AI could come up with. The results might be more abstract, but that's fine for certain uses. And if the goal is to get inspired by the AI, it's sometimes beneficial if the forms are a bit unconventional. Ultimately, it's the designer's job to take those ideas, edit them, and combine them to fit the right context.

Until that point, I'd spent a lot more time on the language side of AI, always looking for use cases or outcomes you could have from a chatbot. Instead of just chatting with an AI for the sake of conversation, I thought it might be interesting if the chat resulted in some tangible output. For example, if you're a designer tasked with coming up with several directions for a logo of a brand, instead of keeping all the thoughts to yourself, you could talk to an AI and develop different directions in a conversation. Not necessarily because the AI has the most amazing ideas, but because it helps to structure your thoughts. And because you can write down your raw ideas without judgment, it might also spark new ideas simply through that process.

So, in the case of a logo, you might come up with different concepts, each consisting of descriptions of colors, shapes, iconography, etc. What could you do with all these descriptions? You could hand them to another designer, follow the guide yourself, or ask another AI to take the first step and draw out designs that fit those descriptions.

Development

I did some tests with my partner, who's a fashion designer and often works on new brands. The results were sometimes amusing, sometimes intriguing, but not very consistent, so I put the project aside for a while.

Then, a few months later, I started the Instagram account for Tame, so I needed to collect some content. I remembered that I had some AI-generated vector graphics and decided to make a post with them. I gathered the best ones I had created up to that point. Looking at them, I was inspired to create more, so I generated about 300, which I then filtered to select the most interesting ones. I could have just shared those designs, but I thought it would be even more engaging to bring the prototype to a level where others could try it too. This way, it becomes interactive, and I can learn what people might be interested in doing with it.

In a few days, I designed and built the user interface. While doing this, I constantly tested the app and made screen recordings of the design process to collect content for a video. I had around 200 designs that were interesting in their own way, and I used them for a little animation in 3D space, where a camera moves through a cloud of those images, showcasing the variety of images that were possible.

There was only one problem: The AI (essentially ChatGPT) didn't always respond in the right way. Sometimes it would respond with SVG code, and sometimes it wouldn't. At times, the SVG was quite useless or complete nonsense. So, I had the idea to curate a small dataset and train the AI with that. This process, called fine-tuning, helps guide the model in a specific direction. Here, it was about ensuring that it responds in SVG code and also about teaching it that some designs were better than others. (That's essentially how ChatGPT became so good - humans told the AI many times which responses were better than others, and these were used to fine-tune the model.)

So I fine-tuned the model on the good SVG designs, and now the results were much better than before. It had fewer errors because it always responded in the right format. To be clear, the output was still not amazing in maybe 9 out of 10 cases. But that one case could be an interesting one, so I thought: why not? It doesn't have to be a real product yet, it's more of an experiment.

Creating vector graphics with a language model instead of an image model has some advantages. First, if you use an image model, you'd get a pixel image and have to vectorize it afterwards. That vectorized image can look pretty good but can also be hard to edit, as the underlying code is just a really long string of vector paths. On the other hand, if you were to design the same image in, say, Illustrator, you would use basic geometric shapes like circles, rectangles, etc., and style them as you wish. Sometimes you need more complex paths, but often, simple shapes are enough. And that allows you to easily change and develop your design further. If you get those complicated vectorized images, editing your design becomes much more tedious.

In the case of using an AI to edit the design by rewriting the underlying SVG code, it can be really helpful if the code is simple. Otherwise, the AI also struggles to understand the meaning of those long paths. But it certainly understands a circle and its radius or border width. This allows the Shapes™ app to not just create vector graphics but also change them in a conversational way. After you get a first draft, you could refine it just by telling the AI what to do. This is a completely new workflow for AI-generated images, as you can develop something further and not always start from scratch like with Midjourney or DALL-E (Usually, they create a completely new image and don't take a previous image and edit it - except for in/outpainting or creating image variations, but that's still quite different).

Launch

So after the whole development process, which happened over the course of maybe a week, I collected all that content and cut a little video. It should show a basic flow of what someone might do with the app. There was one thing that I kind of discovered in the development process: You could also animate the SVGs so that they move like a GIF. This was quite a cool feature because you could just tell the AI to integrate some motion, and it would write the code for that, something that's quite hard to do for non-professionals. And I feel even for people who kind of know the tech, it can sometimes get in the way when you really know what movement you want to create but you spend more time fixing errors in the program. So in Shapes™, you could just describe the overall movement and ideally the AI would translate that into code which then would become visible in the design. I love that idea because here AI gets utilized to bring the creator closer to their output instead of having to deal with all that technology in between. Right now, the animations are still quite simple and limited, but I can see a huge potential if it gets better.

Future

After the release and some feedback, I thought it would be cool to develop the product further. The idea clearly resonated with people interested in graphic/motion design, and I got quite a few sign-ups in the app. Some came from one of the best design studios in Europe. But I could also see that after the first sign-ups, people played around with it and then discovered that the quality wasn't good enough to produce usable results in their work. That was also my feeling before. I played around with it a lot and the good results I got were few. But I still knew there is definitely potential for such a tool. Maybe the technology isn't ready yet, but it might be soon, so it's still interesting to stick with that project for a bit. Not in a full-time way, but more as a side project to see how far I can get the quality.

But I also thought that it takes quite some time and resources to make progress, as the step from a prototype stage to a product stage is quite big. My conclusion was that I wouldn't do it on my own because 1. it's a lot of work and I have other projects as well, and 2. I don't have the needed deep knowledge in machine learning, building your own datasets, and training/evaluating AI models. I understand the concepts and stuff, but to apply that you need more experience than I have. So I wrote to a guy who I met once online on Y-Combinator's Cofounder matching if he would be interested to develop the idea further as a side project. He did his MA in mathematics and is now in his PhD but would like to do some applied work on the side. While he focuses on the core technology and machine learning, I focus on the user-facing aspects and translating that technology into a product. He said yes, so we set up a collaboration agreement for the case that something grows out of that experiment, and now we work together on it.

There are many directions we could take. For example, focusing purely on AI-generated animations. But to get those animations to work really well, we'd first need to understand the basics, the atom level of AI-generated vector graphics. And on top of that, you could put that animation layer as an editing step. That's why we decided to really focus on the foundation for now and train custom models to generate symbols and icons. As soon as it can produce interesting designs there, we can go one abstraction layer higher and focus on the editing part: Compositions, Coloring, Typography, Animation, etc.

And then use-cases like generating logo variations or a whole symbol system or custom animations might be possible at some point. Maybe we'll find a way that works really well, maybe not. It's still in a kind of research stage. But what's important, in my opinion, is that this research happens not just behind closed doors but that we can test it in the real world by updating the app that already exists. And maybe at some point, the product we build will be a combination of many different models that are all experts in their own domain and get all wired up to become something like a creative sidekick. Not to do your work but to give you some directions and ideas that might inspire a new idea.

Louie Gavin, January 10th 2024