Word2Vec is cool. So is tsne. But trying to figure out how to train a model and reduce the vector space can feel really, really complicated. While working on a sprint-residency at Bell Labs, Cambridge last fall, which has morphed into a project where live wind data blows a text through Word2Vec space, I wrote a set of Python scripts to make using these tools easier.
This tutorial is not meant to cover the ins-and-outs of how Word2Vec and tsne work, or about machine learning more generally. Instead, it walks you through the basics of how to train a model and reduce its vector space so you can move on and make cool stuff with it. (If you do make something awesome from this tutorial, please let me know!)
Above: a Word2Vec model trained on a large language dataset, showing the telltale swirls and blobs from the tsne reduction.
Continue reading “Using Word2Vec and TSNE”
Weird blur/glitching when zoomed in really close on the cursor in Mac OS X.
A model of the penicillin molecule by Dorothy Hodgkin from 1945. Sculpture + data visualization + scientific work.
The amazing Seiko UC-2100 watch from 1984.
Some WIP for an upcoming performance: 1,047 syllables from H.G. Wells’ Time Machine input to word2vec space, then reduced from 50 dimensions to two. View a much larger version here.
A detail of one of the syllable swirls.
This is a wonderful poem (via English vowels and diphthongs)
In the box of my new monitor: a report on color and grayscale brightness. Since I got two monitors, I can verify (for nerds like me who really enjoy algorithmically-generated objects like this) that this isn’t canned but actually unique to the monitor.
Two library checkout punch cards, found in books in the Bell Labs library.
Folded and burned, found on the street.