What is DADABOTS?
Not sure what DADABOTS is. We're a cross between a band, a hackathon team, and an ephemeral research lab. We're musicians seduced by math. We do the science, we engineer the software, we make the music. All in one project. Don't need nobody else. Except we do, because we're standing on the shoulders of giants, and because the whole point is to collaborate with more artists.
And in the future, if musicians lose their jobs, we're a scapegoat. jk. Please don't burn us to death. We'll fight for the right side of history.. We swear..
How did you get started working on DADABOTS?
First day we met in 2012 CJ said "Zack I feel like I’ve known you my whole life". We formed a hackathon team called Dadabots. This was Music Hack Day MIT. We were intrigued with the pointlessness of machines generating crappy art. We announced that we set out to "destroy soundcloud" by creating an army of remix bots, spidering soundcloud for music to remix, posting hundreds of songs an hour. They kept banning us. We kept working around it. That was fun.
What inspired you to program an AI to create/replicate music?
We saw the potential of deep neural networks when image style transfer was released. It was amazing when we saw photographs transform into impressionist oil paintings. We had been researching ways to model musical style and generate music with a target timbre. It seemed that deep learning might be the tool we were looking for once Wavenets had been able to synthesize human voices in multiple languages.
How does creating your music with the NSynth algorithm work?
We don't use NSynth. NSynth generates very short samples of monophonic instruments. We used SampleRNN for our bandcamp albums.
How does creating music with SampleRNN work?
We started with the original SampleRNN research code in theano. It's a hierarchical LSTM network. LSTMs can be trained to generate sequences. Sequences of whatever. Could be text. Could be weather. We train it on the raw acoustic waveforms of metal albums. As it listens, it tries to guess the next fraction of a millisecond. It plays this game millions of times over a few days. After training, we ask it to come up with its own music, similar to how a weather forecast machine can be asked to invent centuries of seemingly plausible weather patterns.
It hallucinates 10 hours of music this way. That's way too much. So we built another tool to explore and curate it. We find the bits we like and arrange them into an album for human consumption.
It's a challenge to train nets. There's all these hyperparameters to try. How big is it? What's the learning rate? How many tiers of the hierarchy? Which gradient descent optimizer? How does it sample from the distribution? If you get it wrong, it sounds like white noise, silence, or barely anything. It's like brewing beer. How much yeast? How much sugar? You set the parameters early on, and you don't know if it's going to taste good until way later.
We trained 100s of nets until we found good hyperparameters and we published it for the world to use.
Why have you done this?
Mischief.
Do you plan on using this software for retail purposes?
Neural synthesis will inevitably be a part of next generation audio software. There's no question. Right now the hardware is prohibitively expensive (we use V100 GPUs). But that is changing as we make the algorithms more efficient (clever parallelization, weight pruning, etc). Eventually people can ask their DAW "please generate 5 hours of zombified animal calls produced by a Mike Patton / Jennifer Walshe hybrid".
Do you think Music generated by neural networks will have the potential to reach mainstream success?
Becoming mainstream has been important for subcultures that were underrepresented and needed a voice. Teenagers. African-Americans. Etc. Whereas tech culture already dominates the world. It's swallowing the music industry whole. What does it have to gain by making mainstream music?
Is there any specific reason why you are focusing on math rock and black metal to generate, instead of other, more mainstream genres?
For some reason, other a.i. music people are trying to do mainstream. Mainstream music is dead. Solid. Not alive. Rigor Mortis. Any new music idea it gets has been harvested from the underground. The underground has always been home for the real explorers, cartographers, and scientists of music. The mainstream finds these ideas and beats them like a dead horse until they're distasteful. Why should a musician set out to do mainstream music? Because they want to be famous while they're alive?
Math Rock and Black Metal are the music we love. It has a special place with us. Whereas many new black metal bands sound like an imitation of the early 90s black metal, albums like Krallice's "Ygg Hurr" push it to new places I've never felt before. The research is fresh. Rehashing old sounds is like publishing scientific papers on the same old experiments. Keep music alive.
When did it feel like something you could pull off as independent musicians, or that the music you were making was worth sharing publicly on Bandcamp, etc.?
We were in bands, we liked extreme music, we wanted to make music forever, but we were faced with the existential problem of our inevitable deaths.
A-ha moments: early on, we read Tristan Jehan’s Creating Music By Listening and used his EchoNest Remix library to make soundcloud bots. In 2015 seeing Gene Kogan’s art.. We quit our jobs and self-studied deep learning. In 2016 hearing SampleRNN and WaveNet..
Our first SampleRNN experiment was an imitation of Kurt Cobain. It screamed Jesus. That was the first thing it did. We heard that, we knew we were onto something..
We trained 100s of nets, different genres, different architectures, parameters... Eventually discovered what sounded best to us.. neural death metal, neural mathrock, neural skatepunk, neural free jazz, neural beatbox...
What were those initial teaching sessions like? How did you guys break down duties as producers and programmers?
Bizarre.
The first experiment we tried was with Kurt Cobain's acapellas. When it produced its first output, We were expecting to hear silence or noise because of an error we made, else some semblance of singing. But no. The first thing it did was scream about
jesus. We looked at each other "wtf..?" In a moment of suspended disbelief it seemed like a technomancy séance.
We both do everything: writing code, reading arXiv, music production, art.
What’s the difference about your approach to generating music and other methods of computer generated music like randomizing MIDI inputs?
We train it completely unsupervised. There's no knowledge of music theory. There's no MIDI. There's nothing. It's just raw audio. It's surprising that it works in the first place.
What we love about unsupervised learning is that it gives hints into how brains self-organize raw data from the senses.
What drew you to working with audio rather than these other pretty popular kinds of MIDI approaches?
Midi is only 2% of what there is to love about music.. You can’t have Merzbow as midi. Nor the atmosphere of a black metal record. You can’t have the timbre of Jimi Hendrix’s guitar, nor Coltrane’s sax, nor MC Ride. Pure midi is ersatz.
Most music+ML hackers are generating midi because it’s cheaper and easier to control. Raw audio is unruly and became tractable only recently.
In the future we want to generate genomes. And languages. And states of mind.
Why is it harder?
Raw audio has 44100 time steps a second. These are huuuge sequences. Up to 10,000x larger than sheet music. We needed heavy duty hardware only recently available to us plebes. Also we needed cleverer algorithms. DeepMind and MILA paved the way by publishing their research into neural text-to-speech engines in 2016. We ran it with it and brought it to extreme music.
How would giving the network more time to process the source material affect the outcome?
The trend is: it first learns short timescale patterns (a snare drum hit, the timbre of a scream), then longer (a sloppy guitar riff), then longer (a constant tempo). The more it trains, the more it makes longer timescale patterns. But there's diminishing returns. Also the more it trains the more it memorizes, so some of the most interesting sounds come from when it's only half-trained.
The tagline on the Bandcamp page is 'We write programs to develop artificial artists'; what's the plan for the algorithm in the long run?
But our aim is human augmentation.
Few people write music, but almost everybody has a music aesthetic. Imagine a music production tool where you simply feed it music influences, like a Furby. It starts generating new music. You sculpt it to your aesthetic. Imagine hearing everyone's crazy weird music aesthetic come out of their Furby.
Really this is just meta music - instead of playing the music, we are playing the musician.
Which artists are you considering to push this process further?
We think prolific avant musicians like Mike Patton, who relentlessly take music where it hasn't been, are right for the job. And artists like Drumcorps, Jennifer Walshe, Inzane Johnny, Igorrr, Venetian Snares, Zack Hill (Hella, Death Grips), Colin Marston, Mick Barr, Lightning Bolt, Oneohtrix Point Never, Daveed Diggs (clipping.), Yamantaka Eye, or anyone who's played at The Stone -- we want to give them artistic superweapons and see what fires out of their brains. But really if we can make it really accessible, there will be kids taking it places no one's ever dreamed.
We wanna see Sander Dieleman (one of the inventors of wavenet, admin at got-djent) make neural metal. We all know he would make amazing neural metal.
Do you feel like there’s a need to decolonize machine learning?
What kind of world do you have when the top AI companies are more powerful than most countries and are led by a small knowledgeable elite of PhDs deciding what projects are worth their mindshare? It’s non-participatory.
We were just having a conversation with Samim about this, we agreed that...beyond open source, what we need is open comprehension. Linux is open source. Tensorflow is open source. The research is published for free on arXiv. Even so, how do we make easy the path to comprehension?
I think music/art is a great way for anybody to start playing with AI. We want to see more young people starting AI bands.
There’s a lot of terror wrapped in A.I work, do you think A.I.-assisted art can quell those fears or does "music created by robots" still scare some folks?
Yes and yes. For example, we just did a collaboration with UK champion beatboxer Reeps ONE. For Reeps, hearing his essence distilled and replicated by a machine was met
initially with fear. It's creepy hearing your own voice like this. (And to wonder what's possible with fake news). That fear turned into excitement as he saw his bot doppleganger more as a collaborator. It produces strange beatbox patterns he's never made before, inspiring him to push his craft further.
Do human creators need to fear for their jobs?
If designed with people in mind, A.I. is a creative tool. But the greater possibility of full automation means some creative professionals will be replaced. Especially any easily commoditizable work. This is why we say "automate your job, don’t tell your boss". Be the one running the machine.
How significant is it for your work to assist and/or replace music makers?
100% assist.
We have several new collaborations in the works. Bands like Lightning Bolt, Artificial Brain, Krallice, and more. After listening to so much purely-computer-generated music, at this point it's more surprising to us to hear what humans will do with it.
Everything Dadabots is doing is one big scheme just to collaborate with bands we love. It's a trick and it's working. Pay no attention.
With companies like Spotify, Apple, and Google getting deep into the music-A.I game, how do you think work like yours fits into the cosmology?
We've talked with many people in the cosmology.
Many advances are coming out of academia (Université de Montréal, Queen Mary University, ISMIR, etc), in github repos, in the blogs of PhD students (Dmitry Ulyanov, etc), and in papers published on arXiv. Academics are mostly interested in publishing algorithmic breakthroughs. But only some crossover into music production.
IBM Watson seems to be doing music just to market their other AI products (but we like what Janani did with Watson Beats, and are stoked to see what Krishna does next).
Amper is interested in automating film-scoring.
Most music-AI projects are based on MIDI / sheet music. MIDI is fine for Bach or film scores. Sheet music is fine if you get humans to play it. But not if we want to imitate someone's singing voice, or create modern styles of music, or make one band cover another band's song. Dadabots works with raw audio. Raw audio is significantly more challenging.
Most groups working with raw audio neural synthesis (Google DeepMind, Baidu) are primarily focused on text-to-speech. But DeepMind's Sander Dieleman is a big metalhead, he runs Got-Djent, he digs our album "Inorganimate", so we'd love to hear what he does with neural music.
Google Magenta does neural music synthesis with NSynth though haven't yet generated full songs this way. Their projects are open source and seem to have really good support. Magenta has been doing some artist outreach (we participated in one of them), but they mostly wish to focus their time on the research rather than the music. Doug Eck says they want to be like Les Paul, building the electric guitar, so that a Jimi Hendrix can come along and bend the rules of music. Dadabots fits closely with that mission. But we're different from Magenta in that we're focused primarily on the music and on artist collaborations.
Indie creators are making stuff faster but what happens when big names start to stand behind music they’ve created using their machine-learning programs?
Commercial groups seem to be incentivized to generate pop songs for marketing reasons; academics seem to be incentivized to generate classical music to stay within a tradition; but what's really exciting is making music that has never existed before. That's likely going to come from the indie creators. But deep learning experiments with raw audio are expensive, and commercial groups have the most resources. As long as big names publish their research, indie creators benefit. As long as the breakthroughs flow into arXiv and github, indie creators benefit. Donated GPU credits also help indie creators tremendously.
Was there anything historically you were looking to for inspiration with the project beyond SampleRNN?
We interrupt these questions to tip our hats to culture jamming, poetic terrorism, graffiti art, and the satire of Sacha Baron Cohen.
In 1992, Mike Patton said while
eating a sandwich "computers should take over music.. computers are more fucked up than people.. the more messed up and further away music gets from music, the healthier it will be for music". We agree.
At the same time, do not let computer music amputate your musical ability. Marshall McLuhan once wrote, "every extension of mankind, especially technological extensions, has the effect of amputating.. automobiles amputate the need for a highly developed walking culture…" No machine gives you the new friends you just made because you hopped into a spontaneous beatbox cipher on the street at 2am.
Can you describe the process of creating Coditany of Timeness?
At first, we had a very difficult time getting good sounding audio to generate. We extended the size and complexity of our network until we hit the limits of our GPU memory. Then, we found a middle ground for performance and sound quality. It was a lot of trial and error. As we improved our results, we began to notice that the hazy atmospheric quality of the SampleRNN generated audio lent itself nicely to the lo-fi black metal style. We remade models based on a few different albums. Since we were not conditioning based on musical sections, we had no control over the content of the output. Some models would overfit to learn the details of one part while ignoring the rest of the music in the dataset. This lead us to gradually discover the optimal process after about a dozen attempts.
How long did it take to get it to its final state?
We spent months experimenting with different datasets. Our final model trained in a little over three days. We then had to curate the output by picking the best audio. If everything works as expected, an album can be fully made in about four days.
For Coditany of Timeness specifically, what do you think made this album particularly successful? Did you just run it more times in the program? How much had it learned by then?
We had been tweaking hyperparameters for months to get a speech model to make music. It struggled with percussion. CoT was the first experiment where it successfully generated a constant rhythmic pulse. It was a eureka moment.
The second reason is that Zack actually curated the album flow. He listened through the 10 hours of output audio, found sections that sounded like complete songs, and arranged them in an order. Earlier records, like Calculating Calculating Infinity, the songs were random and repetitive, which, as one reviewer said, made it "listenable but unlistenable".
Did you need to request permission from the original artist?
We did not ask permission. The focus was on scientific research and we were not selling any of the generated music. However, we did contact the band after finishing it. They were intrigued by the project and had suggestions for us. In an ideal world, researchers and artists will collaborate, but that shouldn't stop people from experimenting with well known music. We have found that using a dataset an audience is familiar with, can greatly assist in their ability to intuitively understand what is happening by the process.
It seems like this approach is premised on a kind of remix logic, where new work is always materially connected to sonic works of the past. Maybe this indicative of a lot of AI/ML more generally, but I wonder if this ever feels like a limitation to the project’s potential trajectory? Do you ever worry about falling into a cultural feedback loop?
Good Point! Most generative ML is about maximizing the likelihood of the training data aka imitation. The folks at ICCC wonder what’s beyond..
But why did music evolve in nature? Birds optimize their songs for getting laid. Once we can get AI systems to optimize this kind of loss function, it’ll be a breakthrough for creativity.
For now, our aim is to make art that is closer to the essence of a band than their own music... fall terribly short of this... and laugh at the result.
SampleRNN was used to recreate existing music, so the issue of authorship does not really come up. At what point can it be said that the AI has created something truly new?
The lines of ownership become blurred when multiple artistic sources are brought into the training dataset and the generated output is a generalized blend of each. This is very similar to how humans learn to be original in a new medium. First learning to imitate the masters and then hybridizing styles.
Will your AI be able to generate new music purely on its own at some stage? Is it just a case of feeding it enough data to teach it?
We would need to add more functionality for it to choose its own music to generate. Having access to more kinds of music and a way to condition between artists or styles would allow something that resembles a self evolving musical taste.
At what point can the AI said to be an author?
No idea. Maybe, to call AI an author, it should be able to freely explore a wide variety of styles and have a way to improve based on the reaction to its work via sensory input. In other words, AI would need to pick what music to create on its own. This could be based on audience or critics' feedback. Or based on maximizing novelty/curiosity.
Who would be able to claim the copyright over the work?
No idea. Probably it's got to be case by case. It might be the owner of the computer if the code was sold to that user or company. If the code was made public (open-source) its output might have no copyright.
Do you think it’s a good or bad thing to recognise an AI as a creator?
Bad.
Credit assignment is ok. Just like "made with ableton live" is good to recognize.
Beyond that, the belief in autonomy is insideous.
There is no such thing as true autonomy. Everyone is interconnected. Everything is systems within systems. The recognition of this leads to societies built on transformative justice. The failure to recognize this leads to more prisons.
To legally accept a (strong or narrow) AI system as autonomus is to reliquish accountability and responsibility for its behavior. This will be exploited. You thought corporate personhood was bad? AI personhood will be worse. I fear AI creator copyright will be the foot in the door. People just can't contain their scifi fetish. As they say on the X-Files.. I want to believe
Do you think there will be a point where the artificial intelligence can incorporate real words and coherent sentences into the generated song?
As of at least 2016 this was possible. Did anyone try it? Realistic end-to-end text-to-speech is achievable with Tacotron 2 and others. Applying the same idea to singing is possible. Aligned lyrics-music datasets exist. Has anyone trained to train this net? It's expensive to do this. You need $100,000s worth of GPU hours. Give us the resources and we'll do it.
How do you think artificial intelligence will influence music in the years to come?
Think cartography -- mapping the deep space between all the songs, all the artists, all the genres.
Think super-expressive instruments -- think beatboxers creating full symphonies with their mouths.
Think autistic children, etc, in the context of music therapy, making expressive music, gaining a cultural voice.
What’s next for the project?
We want to build a spaceship for navigating the cosmos of all possible music. Mix bands together. Blend anything with anything. Discover music never before imagined. Make it so easy for kids to invent a new genre and setup bots that make music in that genre forever.
Isn’t it fucked up to generate music using the sound of dead musicians or the voice of dead singers?
If you're shocked by it, just think of yourself in the future, no longer shocked by it.
Your grandkids will find it normal and make music this way. It's like an effects pedal. It’s like the Jimi Hendrix wah-wah pedal. Today you can walk into a Guitar Center, hook up to a pedal, and have Hendrix’s tone. Tomorrow you’ll walk in and play the Kurt Cobain screaming RNN neural synth.
The quicker you get over it the more fun you’re gonna have.
Why does the audio quality suck? Why don’t you generate in high fidelity?
The desire for high fidelity is neverending chase-the-dragon sensory hedonism. It encourages the centralization of media to only groups with the largest budgets. That kind of world is non-participatory and disempowers DIY media. Whereas the most important voices come from those who are disenfranchised.
How did things evolve from the death metal generator to the free jazz one, and what were the new challenges?
As these nets learn, their weights change. Every iteration of weights makes slightly different music. Not all of it good music. Some iterations sound annoying, some are boring. With outerhelios we curated a variety of iterations and randomly switch among them. Some sound like a massacre of baby elephants. Some have these quick start-stops. Some make melodies close to the original album. Some do long drum solos. Some sound like angry geese. The curated variety makes it better.
(Whereas with our death metal generator, there's no curation, you hear everything it makes. It's rare luck that it all sounds good to listen to.)