Glossary from “Old Mortality”

The “glossary” section from Sir Walter Scott’s book “Old Mortality, Volume 2”, what appears to be the oldest book in the Project Gutenberg collection of the Internet Archive. Not sure what it is supposed to be a glossary of, but a weird collection to be sure:

A’, all.
A’body, everybody.
Aboon, abune, above.
Ae, one.
Aff, off.
Afore, before.
Again, against, until.
Ahint, behind.
Ain, own.
Ajee, awry.
Amaist, almost.
Amna, am not.
Continue reading “Glossary from “Old Mortality””

Every Unique Word in ‘usenet-com’

A list of every unique word in the ‘usenet-com’ Usenet archive from Internet Archive. Warning: very long post!

_
___
_____________________________
_______________________________________
___________________________________________________
_______________________________________________________
________________________________________________________
__________________________________________________________________
__________eavesdropping____________________
_mn_main
?
?
??
?99
?berweisen
?berweisungstr?
?e
?gen
?ger
?line
?mchen
Continue reading “Every Unique Word in ‘usenet-com’”

Most Frequent Word Search

For the past few months I’ve been working on a curatorial project with the Internet Archive, to be released on their Tumblr account early next year. One of the experiments for this project searches the Internet Archive for a given term, downloads the first result, parses the most frequent word and uses that as a seed for the next search. For example:

seed > plants > leaves > chinese > heaven > minerva > questo

An interesting result: this process goes from general to more and more specific until no search results are found. This is actually an interesting opposite of my Wikipedia Loops project, where a similar algorithmic path goes from specific to general, eventually falling into a meta-loop.

The code for this experiment is available here: https://gist.github.com/jeffThompson/6718129

PDFs of Brightness, Sorted by Brightness

BrightestDarkest

As an early experiment for a curatorial residency with the Internet Archive, I wrote some software that searches for all PDF texts on the site that contain the word “brightness” in their title or description, downloaded the files (approximately 900 PDFs), analyzed them and sorted them by overall brightness.

Above are the first pages of the brightest and darkest PDFs – a table with all the files and URLs is after the break. Download the source code here.

Continue reading “PDFs of Brightness, Sorted by Brightness”