Chernoff faces showing multi-dimensional data in 2D: Google Quick Draw meets t-SNE, but from the 1970s. Via Tufte’s The Visual Display of Quantitative Information.
This month, the curatorial collaborative project Drift Station, which I’m a part of along with Angeles Cossio, released an online project titled Empty Apartments. We pulled nearly 125,000 photographs of apartments and houses for rent on Craigslist that were completely empty because of a removal service, and presented them as an interactive online exhibition. The project took nearly two years of work, and much of it was manual (Angeles triple-checking every single image by hand to remove ones that included common spaces or non-apartments), but we also used several automated processes and machine learning to sort the photos.
Word2Vec is cool. So is tsne. But trying to figure out how to train a model and reduce the vector space can feel really, really complicated. While working on a sprint-residency at Bell Labs, Cambridge last fall, which has morphed into a project where live wind data blows a text through Word2Vec space, I wrote a set of Python scripts to make using these tools easier.
This tutorial is not meant to cover the ins-and-outs of how Word2Vec and tsne work, or about machine learning more generally. Instead, it walks you through the basics of how to train a model and reduce its vector space so you can move on and make cool stuff with it. (If you do make something awesome from this tutorial, please let me know!)
Above: a Word2Vec model trained on a large language dataset, showing the telltale swirls and blobs from the tsne reduction.