This month, the curatorial collaborative project Drift Station, which I’m a part of along with Angeles Cossio, released an online project titled Empty Apartments. We pulled nearly 125,000 photographs of apartments and houses for rent on Craigslist that were completely empty because of a removal service, and presented them as an interactive online exhibition. The project took nearly two years of work, and much of it was manual (Angeles triple-checking every single image by hand to remove ones that included common spaces or non-apartments), but we also used several automated processes and machine learning to sort the photos.
Arranging By Color And Objects With t-SNE
For an upcoming Drift Station project, we’ve been considering how to curatorially sort a massive number of images (about 100k) for presentation. Chronological? Random order? Some other logical scheme? But a more computational approach seemed to make sense: some way of parsing the images that took into account a variety of visual factors in each image, something that would be impossible to do manually.
Neural networks are the obvious answer here, and so I found some very helpful sample code from Gene Kogan and Kyle McDonald, and wrote some Python and Processing code that loads up a folder of images and extracts a vector representation from them. Then, using t-SNE and Rasterfairy, the images were organized into a 2D grid.
I’ve spent the last few days playing with settings in the code, and found there is an interesting balance to be struck between locally preserving color similarity and object similarity. (Note: this post is more of a quick note than a deep-dive analysis.)
Above: a version with blurred images, showing a pretty clear separation by color with fairly smooth transitions. Click on images for a higher-res version. Continue reading “Arranging By Color And Objects With t-SNE”
WIP: Hard-Drive Visualizations
Some work-in-progress visualizations of the physical location of specific files on my hard-drive, being made as part of my residency at Bell Labs. The above two images are details, showing files (in red) and empty space (in gray); each little square is one 512-byte sector.
This current version is visualizing an 8GB thumb drive. The prints are approximately 36×60″ (91×152 cm) so each individual sector can be seen. The hope is to do a version of my laptop’s hard-drive, which will require a much larger print or set of prints.
The goal of these visualizations is not analytics or troubleshooting – instead, I’m very interested in the abstraction between us, our computers, and our data. We take a photograph of something in the world and store it on a physical drive (even the end-point of the cloud is still a physical object). Once stored on a hard-drive, digital objects, however ephemera-seeming, continue to exist as physical ones and zeroes, magnetic charges of a specific size and intensity. These visualizations are about trying to unpack and see my personal data on that level.
But there is a significant disconnect between myself and my computer, and the actual data on my hard-drive. It turns out that drives contain a specialized chip that handles requests to read and write data; the algorithms that actually retrieve and set data are a closely guarded industrial secret. These chips act as a black box between the user and their data, rendering a detailed picture of the exact location of the bits that make up our digital lives almost impossible. This obscurity is especially prevalent in SSDs, due to the way they store data. This post on Aleratec puts it a finer point on it:
…unlike traditional hard-drives where the physical location of each bit of data is known and constant, the physical location of data in an SSD is highly abstracted from the outside world. Whereas each Logical Block Address (LBA) on a [hard-drive] always points to the same physical location, the physical location to which an SSD LBA points changes often.
My images then are a best-guess of where the actual data is stored, extracted using digital forensics tools. Below is the full image of the drive – click on it for a full-sized view.
Tutorial: Node on Raspberry Pi (for Bots)
Almost all my bots have been written in Python, but I’ve been meaning to try Node.js for more interactive bots for some time. Daniel Shiffman’s excellent new tutorials were enough to get my jump-started, and I created @BotSuggestion, a bot whose only activity is following accounts suggested by Twitter, slowly conforming to their algorithm.
I run all my bots on a Raspberry Pi under my desk (see my tutorial on how to get that set up), but getting an ongoing Node server running took a little more work.
Continue reading “Tutorial: Node on Raspberry Pi (for Bots)”
All My Blank Tracking Pixels
Emails often contain 1×1-pixel, transparent, or tiny hidden images used to track that the email has been opened. Using a Python script, I gathered all 12,383 of them in my inbox and deleted mail folder into the image above. The black pixels on the bottom are the remainder of the pixels in the final row.
The above version is scaled up; see the pixel-accurate 111x112px version here.
You And I Are The Same
Automating OpenCV Training Tests
Automating the process of testing OpenCV training settings using a little Python script.
OCR From Photographs
A quick preview of a longer process: running Optical Character Recognition (OCR, essentially converting images of text into actual text) on photographs, many of which do not include any text at all. Below is the result, run on about 25 images:
L 2w §§.§;§s~5 ,L_§;L
wvmﬁgxb we, «usmswm E ugxwﬁm
L L .L L . L. L .%L.v§RwEf% ﬁgs»:
gﬁﬁz 2&3 3L,..aL.,: a%L.$ §a§£ L
L .L L L LL . ,5 wxmacﬁaswmm mmow.
. LL. LEwz9,.§EE»:§mmnw &¢2.§..E-,~cauu§ Lao
..m L . $L§.3.x;%m£ §. a§,a.i...L«b aw
.L.w.»w§L «Q 30.. a,E8L.w3 aﬁvuf in. my QZL
L . , L .n.?2a.:L3w.mm..S6wn.:s.w L
L . L .. a L.:.&£._.aﬁ.w§.%éamaaxv _
L wmwr.y.,§:..v L
as an w..nSHea£u..nc£a.T8aﬁa£ul1wulDamu£w§uaau£iaJ1ulz., .
A w...w...1».w..wa.. «.;«.....a... .. E...m.Lm, Mun.» a4.rduﬁmvN.mM m .«¢¢.¢.wm.mZ3§.«l%_.4~MoEGuumAuﬁ.nb
mumyuuh my mﬂwwmvwqm ._wrAB..7 ﬁt, uﬂ W/EH9” ﬂ..A..mm.,..m wuauunm mﬂumv .u...m«.X?m Bkaamumﬂm
mm mm... uudmﬁ huﬁunnw Wu wwmm5nwWﬁaamﬂmAnunbum..cu«mu¢a.E—onEou
,..¢Hw~.wwL«wwzz....M./J Nﬁwm 4 1/umwwwu ,._F..11 «mu «my Gcdmwuﬂ mu emu
mama.“ E_§, 3.0 ,.£mT.o$. ..~
,. wnox, WM ,.»..um WW1, ~u:N5.t»Eu.§.tuan~
omaswé mow aﬂuawwu mam
_.,ﬁZ-,u..£..m .5 Ouwnvmm
Iv: V; ‘« .
*9? . - I
staﬂio 3:. 2.. IL . .
‘.22 . 3 3,33-..».. .
». .=oi_:: 3 .::_..,x..
M ...mm.~.n\8.u<: <.:_.:3; .
3<u:E. 5» fr.
.|..t'|.\:.c|..s .1 1 . ,
3%. 32 535 .4 5.3
_ 3.”. 3.
a......? an ._oE.;
311%. :..«:S. :2
an n 8 .<uu..2§ .2
.. . ,. ,, ~
Source code to come shortly, though it’s a pretty simple automation written in Python and using Tesseract.