Tue Apr 12 2022
Using Aleph Alpha’s Multimodal functionality “MAGMA” to create titles for abstract crypto art
Aleph Alphas Multimodal model is a great tool for creating titles for abstract crypto art - by Werner Bogula, Artificial Intelligence Center Hamburg e.V. (ARIC)
Using Aleph Alpha’s Multimodal functionality “MAGMA” to create titles for abstract crypto art
TLDR; Aleph Alphas Multimodal model is a great tool for creating titles for abstract crypto art
Since 2012, deep learning has significantly boosted the ability to detect objects in images. Today, object detection in live and still images is pretty good, however it still relies heavily on labeling photos of objects manually. This might often lead to a symbolic misrepresentation, as a picture is more than just the accumulation of single elements. Thus, current technologies are very reliable in labelling photos of objects, but they lack some intuition to label abstract art.
Managing ~30k pieces of art with AI
Peezmo AI21 is a project by crypto artist Dario Gi. As an AI Enabler, I consult him on behalf of the Artificial Intelligence Center Hamburg (ARIC) in the entire AI value creation process, from creation to marketing. Peezmo AI 21 uses a set of AI algorithms to create a collection of 4000 abstract artworks which are placed for sale on opensea.io as NFTs. Curating up to a thousand artworks a day is one of the tasks we had to accomplish in the value creation process. We successfully used machine learning algorithms to cluster similar artworks and rule out repetitious and merely imitative works. A simple CNN model like VGG16 with KNN does a very good job to narrow down the 30k cosmos to a manageable small asteroid belt of 4000 pieces. For example, to separate abstract works like Jackson Pollock style dripping art from pop art style, see the example below.
For selling crypto art on OpenSea.io you have to fill in several slots on the selling page. Besides the visual (1) , you need a price (2) , a description of the artwork (3), the so-called properties (4) , which distinguish the rarity of the individual pieces, and of course a good title (5) .
Image 1: Sales page on OpenSea.io
We decided to set fixed prices , and generic descriptions to handle the workload. Then, we automatically created the properties similar to the predominating art styles that the individual pieces exhibit. Here we could tap into the results of a 5-year-old Kaggle competition and the great contribution of Jaime Durán .
The biggest problem left: Finding good titles for the artworks. After labelling 400 pieces by head and hand, we realized that we needed some ML technology to fulfill this task.
We applied several strategies from aleatoric methods similar to the „movie-title-generators“, you find for free on the interwebs, to more sophisticated methods such as using machine learning based on wikiart.org or publicly available catalogues of museums. First, we had to define what we’d regard as good titles. We agreed on four criteria.
Good art titles:
– more than randomly related to the visual – creative (not merely descriptive) – related to art history and/or an art genre (allusive or descriptive) – embedded in the context of a certain (sub-) culture (here crypto art and NFT)
The Aleph Alpha Interfaces
Using a film title generator and shuffle in some words from existing artwork titles resulted in rather creative titles but only with loose to random relations to the visuals. This can work, as often modern art lives from a tension between titles and visuals. Throwing in keywords like “crypto”, “bitcoin” etc. establishes a relation between the works as part of a series and relate them to crypto culture. Buddha Burning some Crypto, Crypto Matrix reloaded
To relate the vocabulary more to the art world and genres we used the catalog of the Art Institute of Chicago and created titles like:
Still Life with Flowers Deep Dive, Portrait of a Lambo
The generated titles were somewhat creative. They even followed some implicit rules and vocabulary of a museum catalog with genre descriptions. But soon, we recognized that the value of a title diminishes, if it had no relations to the images.
We definitely had to look for something multimodal that has enough contextual understanding to relate something meaningful to the artworks, yet that was not too realistic (read: dull) to just describe what’s in the image. The challenge with finding creative titles is not just describing what you see in the picture, which is anyway difficult as we are looking at abstract art, but come up with something imaginative.
Large language models with their associative nature and the wide gamut of context they provide seem to be the perfect fit for that. As I have experimented with Open AI’s CLIP to generate some artworks before, I asked myself, why not using multimodal large language models – the other way around – to generate the titles.
Image 2: Images generated with CLIP/VGAN from text ( https://opensea.io/YOHANBO )
Aleph Alpha gave us access to their LUMINOUS language model with its multimodal MAGMA functionality. Before getting started, they showed us on how to use the interfaces to the models as well as how to use creative prompts for getting imaginative results.
The Alchemy of Prompting
The good thing for creatives is that Aleph Alpha provides two separate interfaces to wield the powers of their models.
First, we have the playground , which is an interactive web interface. The playground is a good starting point for checking out potential strategies and the results one can expect. Especially for creatives without a programming background, the playground offers an intuitive interface, where you can insert your inputs and try tweaking the results by controlling the inference parameters of the engine.
Once you have „played“ with your material in the playground, you can move on to the API, which provides the same functionality as a REST interface, that can be called from your scripts or apps.
First, let’s have a closer look at the playground, as this is a good entry point to learn about the parameters one can use to influence the results.
Image 3: Layout of the Aleph Alpha Playground
The large area on the right is the prompt area. You can select and combine two categories of prompts here: text-prompts and image-prompts. For our purpose, we uploaded the artwork and inserted a „description“ of the task we wanted to accomplish into a text prompt.
We can use the combined prompt for several creative inputs that will influence the generation of the answer or „Completion“ as it is called in the interface.
Let’s try out some prompts to get an idea what is meant here and how prompts work.
Models such as the one used here by Aleph Alpha, are generative transformer models that are trained to predict the next item or token for a given sequence. You don’t have to train models like this with 1000s of samples. Instead, you make use of the implicit context that is sedimented in the weights of the models from pre-training.
Due to our task and the multimodal capabilities of the model we provide an image and a text prompt to get a completion which we want to use for our title. And this is where the „alchemy of prompting“ begins.
The first idea that naturally comes to mind, is to prompt something like: „What is the title of this art work?“ Or „This abstract piece of art is titled:“ when presenting the image.
Image 4: Multimodal prompt: a combination of image and text prompt
As we can read in the completion field in image 3 and 4, we notice that the model somewhat „overdelivers“, as it doesn’t just create a single title, but also adds an additional description. Which leads us to the left column of the playground web-interface. In the Maximum Tokens field, you can limit the number of characters. With Stop Sequences you can cause the output to stop whenever certain characters or sequences occur in the completion stream of characters.
Image 5: Advanced Playground Settings
The settings in Image 5 limit the number of tokens to 64 or will stop the output on the first line-break. As we want to generate titles for artworks we might set the Maximum Tokens parameter in a way that we get a concise amount, let’s say 12 and set all kinds of punctuation marks as stop sequences.
As we want to generate titles for artworks we might set the Maximum Tokens parameter in a way that we get a concise amount, let’s say 12 and set all kinds of punctuation marks as stop sequences.
So, we get titles like: „The last word on Super Mario Bros“ or „The Green Godess” which are syntactically perfect art titles.
Now, let’s turn to semantics: Every single word in the text prompt gives a slightly different context, which will influence the result. As the model is vast, it is unforeseeable what result you will get. But this property is very welcomed here to provide creativity on the backdrop of an associative context.
Speaking of creativity, not only the prompt influences the nature of the completion. In the left column of the playground, you’ll find some sliders to set some „wiggle room“ for the generation of the results.
Let’s look at the Temperature slider. The Aleph Alpha documentation says:
„Tweaking the temperature allows you to encourage the model to be more or less “creative”. When model temperature is 0, the most likely token will be chosen every time. With temperature ≠ 0, the model samples from the probability distribution of tokens. At low temperatures, this distribution is heavily skewed towards the most likely result(s). However, as temperature increases towards 1, less likely results become more probable. In practice, this implies that a higher temperature makes unlikely words more likely to be generated, whereas a lower temperature means the model will tend to choose its most confident guess every time. Low temperatures are beneficial when you need the model to be robust, and higher temperatures may be better for things like poetry, or creative writing.”
In our multimodal setting the temperature influences whether we get a completion with closer approximation to the visual (see “Green Godess“) or more freely associations („The last word on Super Mario Bros“). To illustrate this, let’s compare the same prompts with different temperature parameter settings.
What’s that <PERSON> Tag about? Aleph Alpha takes data protection and privacy very seriously. Therefore, texts generated with their models attempt to not contain any unfair, risky or unethical assertions toward persons. One way to mitigate these risks is replacing names, titles or descriptions of real people with the <PERSON> tag. This is a simple and effective way to circumvent privacy issues without losing too much applicability for most of the tasks. For our task, this means we have to come up with a strategy to deal with the <PERSON> tag. We decided simply to simply exclude completions with a <PERSON> tag.
We found that the combination of prompt wording and temperature settings enabled a wide gamut of possible results. We also have noticed, that Aleph Alpha’s LUMINOUS model with its multimodal capabilities is not a knowledge retrieval or a simple image recognition system, but for the lack of a better concept: „an associate collective context memory“ which can get „very“ creative. Almost going into random territory, or territory that is so highly sophisticated that no man has been there before. Or can you make sense of the Delhi or Lenin reference? 😉
For our purpose, we have to find combinations of text prompts and temperature settings that provide the right degree of creativity. By now, we already suspect: This is more alchemy than exact science. But it is definitely not random! As the wording and the structure of a prompt nudge the model in a certain direction, we should get creative in trying out different prompts to get an idea about the underlying model. Think of the model as a large associative parallel memory of concepts which are interlinked in 1000s of dimensions; a bit like the concepts in our mind. (This is not at all a claim that a language model has more than certain topological similarities to the human mind!) If I prime my private associative memory (a.k.a as my brain) with a „prompt“ like „holiday, palms, seashore“ and ask for completing the prompt „it looks like a sandy“, it is very likely that „beach“ will come into my mind. If I „prompt“ my associative memory with „it has not been raining for months in northern Australia“, then „everywhere it looks like a sandy“ will likely be completed with „desert“. In this manner, a complex prompt can activate the network of associations in a language model that influences the completion. So here, the fourth criteria for good titles comes into play: the relation to a sub culture. If we activate the context of a certain subculture (e.g., dark horror novels) we move the completion towards using notions, memes and expressions from this culture. (See the allusive use of „monster“ and „Halloween“ below).
So let’s sum it up.
After experimenting a bit in the playground with Aleph Alpha’s multimodal functionality, we find three out of four criteria for good art titles checkmarked:
– related to the visual (via objects, color or morphology of the image) – creative (not merely descriptive) – embedded in the context of a certain (sub-) culture (stipulated by the prompt)
We disregard our fourth criteria on good titles: – related to art history and/or an art genre (allusive or descriptive)
As we generate crypto art it is not necessary to relate to art tradition. But we can imagine that intelligent prompt design could bring in this aspect.
Let’s go into mass production with the Aleph Alpha API
To label a large number of images (like 4000+), we have to set up an algorithmic method, which leads us to the second interface to the model: the Aleph Alpha API.
The API interfaces input prompts and model parameters for modern programming languages like Python.
After importing the interface, we create a client-object, which we can feed the inputs into, and authenticate our script against the AA API with an API Token.
After importing the interface, we create a client-object, which we can feed the inputs into:
After specifying images and texts as prompts, we prepare an input object as a list and throw it against a model with certain parameters set:
Once we have set up the API, we can start with mass production .
All we need now is:
1. The images in a standard format (here 384×384 px)
2. Good settings for the model parameters: temperature, number of tokens, stop sequences…
3. Creative prompts that „nudge“ the model into the right direction for style and creativity
4. Some post-processing to iron out the quirks of the models
Some ideas for „creative prompts“ as suggested by Niklas Finken of Aleph Alpha whose ideas opened the treasure trove of prompting to us:
To make the prompt fit our needs: relation to PEEZMO AI21 and the embedding in crypto culture we chose the prompt like this:
Aleph Alpha LUMINOUS language model with its multimodal functionality has turned out to be the perfect tool for us to name large numbers of abstract crypto art pieces. The model delivers a wide range of contexts that can be activated by creative prompts and managed with parameter settings such as temperature, number of tokens and stop sequences. All this provided by a concise and fast API delivers a performant method of fulfilling our tasks on a mass scale.
If you want to try out their model you can clone our script and experiment with it yourself.
A picture says more than a thousand words, so maybe you can think of ways to find another thousand. 😉
This article was written by Werner Bogula, LinkedIn: https://www.linkedin.com/in/wernerbogula