in Articles

Challenges and opportunities of artificial intelligence in music

by Otso Björklund

The development of artificial intelligence has seen the introduction of several different paradigms, the latest often superseding its predecessor. In the current millennium, the improvements in the accuracy of artificial intelligence applications have rendered them usable for a range of tasks that were previously considered challenging for a machine, such as image and speech recognition.

In recent years, the enthusiasm surrounding artificial intelligence has intensified, bringing science fiction-like visions of the future of artificial intelligence back into discussion. The latest interest in artificial intelligence has been centered around generative artificial intelligence, which is capable of producing text, images, sound and, to some extent, music. But what can generative AI mean for music making?

Since its first iterations in the 1940s, the goal of artificial intelligence development has been to make a computer perform tasks that are thought to require human intelligence. As a field of research, artificial intelligence covers a wide range of different paradigms and techniques, most of which have generated great enthusiasm whenever they have managed to produce results superior to previous techniques. There have also been quieter periods in the history of AI, without major breakthroughs.  

In recent decades, machine learning has become the dominant paradigm of artificial intelligence to the extent that in the present day, artificial intelligence is effectively synonymous with machine learning. In machine learning, a learning algorithm identifies regularities from the training data, as opposed to a human trying to define exact rules for the operation of artificial intelligence. With the help of these regularities, a machine learning model is formed, on which the operation of the AI program is based. There are many different models, ranging from simple statistical regression models to artificial neural networks containing billions of parameters. 

In the current millennium, neural network models have typically achieved the most impressive results with images, sound and text. Although neural network models are inspired by early brain research, they do not actually imitate the human brain. Instead, they describe multidimensional mathematical functions. The concept of neural networks dates back to the 1940s, but only through the increase in computing power in the 2000s did it become one of the most significant methods of machine learning. The strength of neural networks lies in their extensibility and versatility. Many different structures, or architectures, can be created by combining artificial neurons for different applications.


Computational creativity and generative artificial intelligence 

Creativity is considered one of the forms of intelligence, and the possibilities of computational creativity have been researched long before the rise of modern generative artificial intelligence. A computer-generated output, such as a composition, is created as a result of a generation process. Most music-generating programs produce music as a sequence of note events.  

Randomness is an essential part of this process, because by following a completely predetermined algorithm, the program would produce the same result every time with the same prompt. When generating music, a computer therefore produces aleatoric music of sorts. To avoid a completely random-sounding end result, the probabilities of note events and their properties can be adjusted, and event probabilities can be set to be dependent on both each other and the prompt given by the user. This gives the randomly generated music a certain regularity, which makes the end product seem less random.

Although neural network models are inspired by early brain research, they do not actually imitate the human brain. Instead, they describe multidimensional mathematical functions.

Generative models fundamentally describe regularities using probabilities. The early use of Markov chains for music generation is a simple example of generative modeling. In a Markov chain, the way pitches, for example, are selected is that the previous, already selected pitch determines which pitches are the most likely alternatives for the next pitch. The goal of the learning algorithm is to adjust the probabilities contained in the model in such a way that its outputs mimic the training data as closely as possible. To avoid reducing all generated outputs to a mere average of the training data, the prompt given by the user also controls the generation process by affecting the probabilities.

One of the most significant inventions behind the new generative neural network models is the Transformer architecture introduced in 2017. This architecture is intended for situations where one sequence of tokens, also known as a string, needs to be transformed into another sequence of tokens. Originally, the Transformer architecture was designed for text translation, but it has also been found to work extremely well for generating text based on user prompts. For example, ChatGPT is based on the Transformer architecture. 

By considering music as a sequence of note events, the same architecture can also be applied to music generation. Generating music with this kind of neural network model works in the similar way as with a Markov chain: the probability distribution required for randomly selecting the next note event depends on both what the user has given as a prompt and what was previously generated. The generation therefore proceeds from beginning to end, and the selection of each note event is influenced by a segment of previously selected note events. The segment that the model retains in its memory is called the context. As the generation process progresses, the model gradually forgets what it has generated at the beginning of the process when the context is updated. Despite this limitation, the Transformer model is capable of taking into account more complex relationships between note events than any previous methods.


The challenges of autonomous artificial intelligence in music


The results achieved with the latest generative models for creating images and text have naturally inspired researchers and companies to use the same models for music generation. However, the results have not been as promising as they have been with text and images.  

According to the latest research, neural network models still face considerable technical challenges in generating music. Based on listening tests, the newest neural network models do not necessarily produce any better results than the significantly simpler Markov chains. Modelling music as a sequence of events with a limited context also fails to enable the production of extensive form, or even shorter structures such as coherent pop music melodies. Hence, results of music created through artificial intelligence applications are often only presented through short excerpts. Generative models are supposed to produce music that follows the regularities found in the training data. Sometimes the regularities are followed too closely and as a result, generative artificial intelligence directly plagiarises fragments of its training data.  

An artificial intelligence model that autonomously composes music is an ambitious technical challenge for AI developers. Even if it were possible to achieve such a model, it might not be the most meaningful application for artificial intelligence in music.

Training a generative model requires versatile data in the right format. Audio recordings are much more readily available than symbolic data, such as machine-readable scores or MIDI files. However, learning musical features from recordings is significantly more challenging for a machine learning algorithm, compared to symbolic data. In addition to availability issues, there are ethical challenges associated with data acquisition. Copyright has not always been respected when acquiring training data. Companies that sell artificial intelligence applications also wish to collect more data for developing their models further, which is why they are keen to collect data produced by the application users. Even draft versions of unfinished work may end up becoming training data for artificial intelligence through the application used by the composer.


Artificial intelligence as a composer’s assistant


An artificial intelligence model that autonomously composes music is an ambitious technical challenge for AI developers. Even if it were possible to achieve such a model, it might not be the most meaningful application for artificial intelligence in music. Survey results indicate that composers still want to be in control of their creative decisions, even when using artificial intelligence.  

Artificial intelligence is therefore best suited as a composer’s assistant. The generation of music must be comprehensively controllable by the composer, with its role defined to experimenting with ideas rather than producing the final product. Many companies have attempted to create text-based applications for music generation. Describing the desired result with text prompts can be challenging if one wishes to influence the end result in any detail. Music, or sound in general, can be a more controllable prompt for AI music processes in professional use, compared to text.

Survey results indicate that composers still want to be in control of their creative decisions, even when using artificial intelligence.

Automation is usually the preferred tool for repetitive and mechanical processes, and at its best, artificial intelligence is very well suited to these types of tasks. In addition to music generation, AI can be used for tasks that require the processing of recordings or sheet music. Artificial intelligence can also enable processes that would be incredibly tedious to manage manually, for example comparing certain motifs between thousands of scores. Especially in audio processing, AI applications have developed significantly in recent years. 

In music making, replacing humans with artificial intelligence is a meaningful goal mainly for commercial purposes. Generative models that repeat the regularities of the training data are only able to create something truly novel in very limited ways, for example by making new combinations of previously used styles. In the role of a tool or an assistant, artificial intelligence can have a lot to offer to music creators, as long as the artificial intelligence can be adequately controlled by humans, leaving them in control of the end result. 


Otso Björklund (University of Helsinki) holds an M.A. in musicology and M.Sc. in computer science. He is currently working on his doctoral dissertation on machine learning models for repeated pattern discovery in music.  

Featured picture: Nature by Alan Warburton. Source: Wikimedia Commons
Translation: Hanna-Mari Latham