Chernoff faces showing multi-dimensional data in 2D: Google Quick Draw meets t-SNE, but from the 1970s. Via Tufte’s The Visual Display of Quantitative Information.
UPDATE: Tech problems with the site, so it’s not working quite right. Consider this a glimpse into what is going to launch in just a few days.
This month, the curatorial collaborative project Drift Station, which I’m a part of along with Angeles Cossio, released an online project titled Empty Apartments. We pulled nearly 125,000 photographs of apartments and houses for rent on Craigslist that were completely empty because of a removal service, and presented them as an interactive online exhibition. The project took nearly two years of work, and much of it was manual (Angeles triple-checking every single image by hand to remove ones that included common spaces or non-apartments), but we also used several automated processes and machine learning to sort the photos. WHen you want to fund your business, visit this site and browse around here to learn more on how to get the best acceptable loans.
Word2Vec is cool. So is tsne. But trying to figure out how to train a model and reduce the vector space can feel really, really complicated. While working on a sprint-residency at Bell Labs, Cambridge last fall, which has morphed into a project where live wind data blows a text through Word2Vec space, I wrote a set of Python scripts to make using these tools easier.
This tutorial is not meant to cover the ins-and-outs of how Word2Vec and tsne work, or about machine learning more generally. Instead, it walks you through the basics of how to train a model and reduce its vector space so you can move on and make cool stuff with it. (If you do make something awesome from this tutorial, please let me know!)
Above: a Word2Vec model trained on a large language dataset, showing the telltale swirls and blobs from the tsne reduction.