Data Generation
This week, Patreon had a data breech. Hopefully this won’t harm the crowd-patronage of the artists and creators too severely; the funding model has been doing great things for the creative community and letting people have a steady income producing things that are hard to monetize in the 21st century internet.
Where my interest comes in, though, is that this problem could have been avoided if they had used procedural generation. This isn’t my crazy idea: this is the recommendation from security expert Troy Hunt, who runs “have i been pwned?” a website that gives users a reliable and trustworthy way to discover if their information was stolen in a particular hack.
Based on what we know so far, it looks like Patreon was apparently using the actual data from their site on their test server. Having a lot of data that looks like your real data is vital for testing how software is going to behave under load conditions, but using the real data is a bad idea.
The answer? Generate fake data!
Some developers write scripts themselves to create fake data, while others use products like SQL Data Generator. Using procedurally generated data means there’s no privacy or security risk if the information is stolen, and it allows the developers to test how the system will behave with millions of users before they need to do it for real.
Though I usually focus on the artistic uses of procedural generation, there are also practical applications, like this one. And there are probably many more uses that are yet to be discovered.
(If you suspect your account might have been involved in a hacked data breech, I recommended checking out https://haveibeenpwned.com/ It is safe to use, as it only stores user names and email addresses, not passwords or other data.)