Words and labels can be extremely abstract notions. When you think about music, what comes to mind? Is it a particular song? Or maybe you visualize a piano score? It could also remind you of that time you spent playing Donkey Konga with your brother.
Creating a concrete implementation of such an abstract concept as music is a daunting task, and I believe these thoughts are clues that can help us reach our objective and this is just what we’ll do here: write down some thoughts about the music generation process, while keeping in mind that we wish to create an automated system.
The examples I provided help us apprehend the different aspects that make up music:
A piano score is a representation of music, basically sound encoded as visuals. They contain a lot of information in little space.
The song you’re listening to is the closest thing to our target.
The Donkey Konga example shows us that there different ways to enjoy music, and us being an active part of it is one of them. This is not important in our current task, but can be considered an extension of it.
So what do we get from all this? Three things:
We need to be able to easily represent musical elements and manipulate them. That is Representation.
These elements make little sense by themselves, they need to be organized and relate to one another in a logical and pleasing fashion. This is the Composition part of music making.
Finally, music gets a major part of its magic from all the instruments that exist and the diversity of sounds they enable. Selecting or creating appropriate sounds for a song is what I call the Timbre part of music making.
Representation
As I began work on this project of creating an algorithm to generate music, I was trying as hard as I could to remember my old Music Theory lessons and it seemed obvious to try to implement Western Music Theory notation.
This is what we get from it:
Standardized Human-readable notation which makes intuitive sense, since western music notation is widespread
A wealth of knowledge dating back to Ancient Greece, with well-studied patterns of sounds and rhythm
Building blocks for basic composition, such as Scales, Intervals and Chords
Seperation between music representation and the sounds produced.
I feel compelled to mention the fact that most computer-generated music implementations usually manipulate sound directly, by modifying the frequency used, using filters etc… This approach is different from what I am trying to implement personally as the seperation between Representation, Composition and Timbre become blurred or even non-existent. I also would like to keep a Human in the loop, with the algorithm being a supplement to the human.
Composition
We now have the basic building blocks for more elaborate pieces and it is now time to tackle the most complex part of such an algorithm: Composition.
There is no easy answer to this problem, as we need to handle both rhythm and melody. What’s more, maybe you’ll want to be able to handle polyphony too. The most important question to ask yourself is therefore: What Do I Want To Do?
This is where our choice to base ourselves on Western Music theory is helpful: Intervals between notes are classified in how pleasing they are to the ear, with Perfect/Imperfect Consonants, and dissonances.
One possibility for creating a Melodic line could therefore be to constrain next note generation on allowed Intervals both horizontally (i.e. same melodic line) and vertically (bassline) and on scale notes. This is an example of a very simple yet extremely powerful system, and you can find a more involved system for Counterpoint described here.
Creating a Composition system will be up to you, but you can see some examples in my upcoming article on Music Composition as an Imitation Game, as well as the one on Generating Simple Melodies from Chord Progressions.
Timbre
As you may already know, sounds are vibrations in the air and are modelled with periodic functions (such as a sine function). Therefore a note, such as A5, is a periodic function repeating at a certain frequency for a given amount of time, with the pitch being determined by the frequency.
This is important to know as it allows us to create some auditory effects (such as binaural beats), and directly manipulating frequencies is usually the way computer-generated music algorithms are implemented.
Knowing how sounds are generated opens up a fork in the road:
Do we create our own sounds?
Or do we use samples?
What’s more, the different sounds used must complement each other and not clash one against the other.
This isn’t easy, which is why I recommend to output your generated music as a midi file. This allows you to either generated a wav or mp3 file by using a SoundFont, or to use it for further processing in a DAW software.
Final Words
This post got more lengthy and less precise than I initially hoped for, but I guess the scope of the project is too big for a single article. What’s more, I believe these kind of things should be linked to practice, so stay tuned for more articles on the topic.
Still, there are a few things that stand out for me when working on this project. Some are pretty straightforward, such as the necessity of doing research and taking time to specify the task at hand.
But in the end, what struck me the most about all this, is the degree of freedom I have. I have worked as a Quant in Finance and I also make games on my spare time, which are 2 things which require the developper to keep an eye out for performance. Here there is no need for optimised code, so I can focus on expressiveness, which is the best part of programming as far as I’m concerned. I hope you enjoy your freedom too!