Excited about Text Mining

I’m getting excited because because there’s any number of projects I’ve been imaging on this topic since we started.

Right now I’m studying a technical text. It’s Sunday. I’ve been studying all week with the exception of Saturday night when I watched a few DVDs. And I’m frustrated, because I can’t concentrate, can’t focus. I put on some music (lately iPod genre selection: classical). I’ve switched back to my Linux partition and I’m using VIM (I could have stayed in Windows using Cygwin, but it’s buggy and annoying). And the image which pops into my mind (yeah, I have a photographic memory) is a scene from the Social Network (2010). I look it up on google (still distracted) and find the exact phrase: ‘wired in’. I wish I was ‘wired in’.

Some movies define a generation. More specifically, they give life to cultural tropes through the cinematic form–sound, motion, image, context. What’s more, for generations to come, any youth who learns to program and has access to this film, may walk away with this image in their mind too and recall the phrase “wired in” whenever they think about being focused on your work to the exclusion of all else. In fact, this youth may learn this commitment trope even before they feel committed to anything.

Similarly, some applications are particularly conducive to flow states. VIM is one of them. VIM is, as the man page suggests, a programmer’s text editor. It’s designed to work without the use of a mouse, which has a great influence on the sensation of ‘flow’ as far as physical movements is concerned–you never need to lift your hands from the keyboard. An inhibitor of flow is any stimulus which takes you out of your current frame of mind. For example a miss typed word. When I misspell a word in VIM, most of the time I only need to type this sequence: ESC [ s 1 x = SHIFT+4 a, and the error is fixed and I’m right back to typing. Now some people might say, Microsoft Word will auto-correct misspellings–you don’t need to type anything. True, but I don’t like Word. I love VIM. Just like the youth who learns the idea of focused work, some youth who grows up with networked software (no CD required) will wonder why you ever needed to own a copy of your software in the first place.


An interesting text analysis might be searching documents and linking specific phrases back to particular cultural references. Combined with other markers, analysis could determine what ideas a student has, what norms, they might be exploring at a given moment in time. I’m thinking of the developing ego of children and their capacity to latch onto particular ideas to the exclusion of all else. I’m interested in the mindset of the young and the old to prefer specific ideas (or exclude options) wrapped in the fuzzy glow of nostalgia for knowledge which arrived at the exact time when we needed it. Brands work in similar ways, too.

This is an idea I’ve been toying with for a while. A project idea. Maybe something which could start with a WordNet analysis. Or grow out of my own collection of cultural references.


Some ideas we have, some questions, have answers, but we simply can’t find them (or we believe there is an obvious answer, else others would ask the question so frequently, we could google it easily). Either because we’re experiencing something uncommon, or because we formulate the question uncommonly, or because there is one particular and dominant answer which we reject or suspect.

The-Truth-Is-Out-There

I don’t subscribe to the dorm-room-entrepreneur mythos of the Millennial generation. (If Net Neutrality fails then a great software idea might not be enough to take over the world anymore.) Is it your dream? Are you wired in? I do wish to be wired in right now, because I’m completely distracted.


Footnotes

Any number of these Production Companies might own part or all of the rights to the image from the Social Network: Columbia Pictures, Relativity Media, Scott Rudin Productions, Michael De Luca Productions, Trigger Street Productions.

And the same with the X-Files television show and all of there hard working folks: Ten Thirteen Productions, 20th Century Fox Television, X-F Productions.