This software was built with Processing during the Spring 2006 semester at ITP in Dan Shiffman's courses: Programming from A to Z, and The Nature of Code. The parsing, spidering, and Bayesian filtering code comes directly from Dan's weekly exercises. I would have never been able to figure any of this stuff out if it weren't for Dan's unparalleled ability to explain advanced programming concepts to people like me.


We've been told that we are in the midst of an "information revolution" and that "aggregate human knowledge" is doubling at an exponential rate. But what and where is all of this information? Much of it is simply our own conversations with each other. We gossip about each other, complain about popular culture, and argue about politics as we always have, except now we're using computers to do it. We are filling databases with terabytes of vernacular language. We generally consider the majority or this language to be noise because it's not relevant to whatever task may be at hand. We're letting the software we use frame our conception of the data. However, if we make new software we can begin to form new conceptions. If we conceive of these databases not only as information to be searched and sorted through, but as a vast new landscape of recorded human experience we can begin to render them as such. This software looks for conversational language on the web, based on a search query, and generates large-format social landscapes using a set of programmatic drawing assets.

Looking For Conversations

The software accepts a command-line search query which uses the Google API to scrape the web for content. As it searches, certain web pages are prioritized (social-software sites like myspace, online forums, blogs), and others are skipped over (news sites, wikipedia entries). This content is collected, parsed, and broken down into small one or two sentence snippets. A bayesian filter then checks each snippet to see if it qualifies as vernacular language. The filter works in the same way as a spam filter: it compares the snippet to a collection of "good" text, and to a collection of "bad" text, and then disregards the "bad" (in this case the "good" is vernacular language and the "bad" is a collection of articles, essays, and other expository writing samples). The result is an array of small conversational blurbs.

Generating A Landscape

Originally, my intent was to explore algorithms for visually displaying these language blurbs spatially by letting them self-organize into conversation clusters. I devised a silly and contrived method of assigning each blurb a personality value based on its content. My hope was that I could generate new and interesting conversations with bits of dialog culled from entirely different online spaces. When I implemented this algorithm I was pleased with the compositions, but not only did the software take too long to run, but the resulting conversation clusters seemed arbitrary. In the end, I created an algorithm that simply generated the conversation clusters randomly. The effect was the same and it was much more efficient.

Drawing Programmatically

The blurbs are rendered using a set of programmatic drawing assets. For each blurb, a figurative representation and a block of text is rendered. The diversity of the figurative representations is achieved by assigning probabilities to the different elements that make up the figure (there's a 30% chance that any given figure's shoes will have heels). The text blocks are rendered using a programmatic type-face which simply draws each blurb a little differently each time. At this point the probabilities and the algorithms used to generate the drawings are not tied to the content of the language they are rendering in any way. I'm currently exploring new ways to connect the drawing algorithms to the content of the drawings.