Al Generative Art is a type of art, in most cases visual, that is based on cooperation between a human being and an autonomous system. An “autonomous system” is defined as an Artificial Intelligence software, algorithm or model capable of performing complex operations without the need for programmer intervention.
From the bizarre juxtapositions of images created by Dall-E Mini to the NFT market, images generated by AI algorithms are increasingly entering the mainstream imagination. In fact, two important projects on the subject that deserve to be analyzed are: Midjourney and DALL-E 2.
Of course, the news has also made its way to Twitter. Commenting on it, among others, is Charles Hoskinson, who wrote:
AI generated art. I was able to make this picture in just a few minutes. I can’t imagine how remarkable this technology will be in 3 years pic.twitter.com/jOToCZj7ki
— Charles Hoskinson (@IOHK_Charles) February 1, 2023
Al Generative Art: early experiments and features
Having understood what Generative Art is, it is important to emphasize one of its founding principles: randomness. Which is a fundamental property of Generative Art.
In fact, depending on the type of software, the autonomous system is able to process results that are always different and unique each time the generate command is executed, or it can return a variable number of results in response to user input.
The first experiments in Generative Art date back to the 1960s with the experiments of Harold Cohen and his AARON program. Cohen first used stand-alone software to generate abstract artworks inspired by Pop Art silkscreens. Cohen’s works are now on display at the Tate Gallery in London.
Another attribute of Generative Art, but one that is less and less a prerogative, is the repetition of patterns or abstract elements provided by the programmer and implemented within the software code.
In addition, the development of increasingly complex neural networks operating on text-image association has enabled the development of generative models capable of creating increasingly realistic and accurate images. The best known example of this category of Generative Art is Dall-E.
Dall-E is a multimodal neural network based on the GPT-3 deep learning model from OpenAI, the same company that also recently developed ChatGPT, the chatbot launched in November 2022 and optimized with “supervised” and reinforcement learning techniques.
Returning to Dall-E, we see that this system is capable of generating images from a textual description, called a “prompt,” based on a dataset of text-image pairs.
The first version of Dall-E, which was presented to the public in January 2021 and remained the prerogative of a small number of professionals in the field, represented a real revolution in terms of this type of generative model, surpassing the innovations of GPT-3 itself.
Also of significance is the fact that the accuracy of the results processed by Dall-E proved to be the perfect scope for another OpenAI solution: CLIP (Contrastive Language-Image Pre-training).
An image classification and ranking neural network trained on the basis of text-image associations, such as captions found on the Internet. Thanks to CLIP’s intervention, which reduces the number of results proposed to the user per prompt to 32, Dall-E was found to return satisfactory images in most cases.
Midjourney: design, human infrastructure, and artificial intelligence
As anticipated, Midjourney is an important project that is part of the emerging Al Generative Art concept. Specifically, Midjourney is an independent research laboratory that explores new means of thinking and expands the imaginative powers of the human species.
Using it is simple: first an account must be created on Discord, a platform that hosts various communities, where Midjourney is one of them. Within the application are the various chatrooms in which one can actively participate or not in discussions.
It is important to point out that to try using Artificial Intelligence for the first time one must go to the “newbies” channels, where 25 free renders are available.
One render corresponds to the generation of four different variants generated from the same textual input.
Thus, the 25 renders refer to 25 processing jobs performed by the Midjourney bot. Consequently, generating the image requires interacting with the Midjourney bot via a text message called a “prompt,” in which there will be keywords describing the image the user has in mind.
You can add as many details as you want, the important thing is to divide the keywords with a comma. Once the rendering is finished, the computer returns four different images based on the descriptions to choose from.
In addition, once the program has finished rendering, you can communicate your preferences based on the images and, if you wish, have four more versions generated again.
DALL-E 2: the new AI system for artworks
In addition to Midjourney, DALL-E 2 is also the new AI system that can create realistic images and artworks from a natural language description. Not only that, DALL-E 2 can also combine concepts, attributes and styles.
The strength of the new AI system also lies in being able to expand images beyond what is in the original canvas, creating new expansive compositions. In addition, it can make realistic changes to existing images from a natural language caption and can add and remove elements taking into account shadows, reflections and textures.
DALL-E 2’s capabilities also include taking an image and creating several variations of it inspired by the original. DALL-E 2 has learned the relationship between images and the text used to describe them.
It uses a process called “diffusion,” which starts with a pattern of random dots and gradually alters that pattern toward an image when it recognizes specific aspects of that image.
So, after OpenAI introduced DALL-E in January 2021, now the newest system, DALL-E 2, generates more realistic and accurate images with four times the resolution.
DALL-E 2 started as a research project and is now available as a beta version. Security mitigations that the system has developed and continues to improve include: limiting the system’s ability to generate violent, hate, or adult images, and learning-based phased deployment.