yhancik:

There’s something slightly sinister about it

(tagging @procedural-generation)

Neural Network Learns to Generate Voice

This is cool, not least of which because it lets you see the process of training a neural network using torch-rnn.

If you’ve never done it before, I recommend training a neural net just so you can watch it learn. (Another reason why it’s important to periodically save the state and settings.) 

I suspect, based on how much data is typically needed for text generation, that a lot more training data is needed to produce words from scratch. And existing open-source speech-synthesis does a decent job if you just need intelligible spoken words. Still, if you wanted to create something that could produce Simlish-like dialog for your next project this would be an interesting way to do it.

Since torch-rnn works on characters without any context, the audio was converted into UTF-8 text, fed into torch-rnn, and then the resulting text converted back into audio. A sound-aware implementation or a better encoding of the audio data would probably improve the performance, but there’s something uncanny about the total contextless learning.