For an upcoming Drift Station project, we’ve been considering how to curatorially sort a massive number of images (about 100k) for presentation. Chronological? Random order? Some other logical scheme? But a more computational approach seemed to make sense: some way of parsing the images that took into account a variety of visual factors in each image, something that would be impossible to do manually.
Neural networks are the obvious answer here, and so I found some very helpful sample code from Gene Kogan and Kyle McDonald, and wrote some Python and Processing code that loads up a folder of images and extracts a vector representation from them. Then, using t-SNE and Rasterfairy, the images were organized into a 2D grid.
I’ve spent the last few days playing with settings in the code, and found there is an interesting balance to be struck between locally preserving color similarity and object similarity. (Note: this post is more of a quick note than a deep-dive analysis.)
Above: a version with blurred images, showing a pretty clear separation by color with fairly smooth transitions. Click on images for a higher-res version.
The first tests used the images un-transformed (except shrunk down and made square). The results weren’t great, so I tried applying various levels of blur: just a little to smooth things out, a medium amount that obscured all but the most major details, and finally so blurred that the images were just washes of color. This definitely improved the layout, but, most noticeably, it did a great job of putting images of similar colors near each other. Above is a composite of about 2,000 images, having been given a heavy blur; a detail is below.
But Kyle’s code included the combining each non-blurred image with several versions at varying blur levels before training. I made a few modifications to the code and tried it. At first, I was disappointed: the color wasn’t nearly as continuous as my early tests, as you can see here:
But on a closer look, something interesting appears: while image-to-image color wasn’t as smooth, similar object types were grouped together! Dining rooms with dark wood furniture with other, similar dining rooms. Toilets with other toilets.
I’m definitely a machine-learning pirate, grabbing tools when they’re helpful and only understanding a small amount about what’s going on. But (I’m guessing) that by blurring the images, color is evened out at the expense of detail, and therefore objects get lost. But by combining various blur levels, both kinds of data are retained. In this case, I think the detailed, object-specific data is more present than general color info, which is why the combined versions are not as smooth, but still an interesting way of thinking about this kind of dimensionality reduction.
Thanks for the post, would be interesting to see your code, or maybe even do it as a tutorial! Keep it up!!