DeepTingle

I’m temped to just leave this without comment. But there’s a serious point here too:

There’s no denying that many of these systems can provide
real benefits to us, such as faster text entry, useful suggestion
for new music to listen to, or the correct spelling for
Massachusetts. However, they can also constrain us. Many
of us have experienced trying to write an uncommon word,
a neologism, or a profanity on a mobile device just to have it
“corrected” to a more common or acceptable word. Word’s
grammar-checker will underline in aggressive red grammatical
constructions that are used by Nobel prize-winning authors
and are completely readable if you actually read the
text instead of just scanning it. These algorithms are all too
happy to shave off any text that offers the reader resistance
and unpredictability. And the suggestions for new books
to buy you get from Amazon are rarely the truly left-field
ones—the basic principle of a recommender system is to
recommend things that many others also liked.

What we experience is an algorithmic enforcement of
norms. These norms are derived from the (usually massive)
datasets the algorithms are trained on. In order to ensure
that the data sets do not encode biases, “neutral” datasets
are used, such as dictionaries and Wikipedia. (Some creativity
support tools, such as Sentient Sketchbook (Liapis,
Yannakakis, and Togelius 2013), are not explicitly based on
training on massive datasets, but the constraints and evaluation
functions they encode are chosen so as to agree with
“standard” content artifacts.) However, all datasets and
models embody biases and norms. In the case of everyday
predictive text systems, recommender systems and so on, the
model embodies the biases and norms of the majority.

It is not always easy to see biases and norms when they are
taken for granted and pervade your reality. Fortunately, for
many of the computational assistance tools based on massive
datasets there is a way to drastically highlight or foreground
the biases in the dataset, namely to train the models on a
completely different dataset. In this paper we explore the
role of biases inherent in training data in predictive text algorithms
through creating a system trained not on “neutral”
text but on the works of Chuck Tingle.

In a world where recommender systems try to sell us things we already own and AI projects are trying to revive phrenology and sell it to police departments, it is worth remembering that no dataset is truly neutral.

http://www.deeptingle.net/index.html