There’s something slightly sinister about it
(tagging @procedural-generation)
Neural Network Learns to Generate Voice
This is cool, not least of which because it lets you see the process of training a neural network using torch-rnn.
If you’ve never done it before, I recommend training a neural net just so you can watch it learn. (Another reason why it’s important to periodically save the state and settings.)
I suspect, based on how much data is typically needed for text generation, that a lot more training data is needed to produce words from scratch. And existing open-source speech-synthesis does a decent job if you just need intelligible spoken words. Still, if you wanted to create something that could produce Simlish-like dialog for your next project this would be an interesting way to do it.
Since torch-rnn works on characters without any context, the audio was converted into UTF-8 text, fed into torch-rnn, and then the resulting text converted back into audio. A sound-aware implementation or a better encoding of the audio data would probably improve the performance, but there’s something uncanny about the total contextless learning.