Week 7: Reflecting on Text Mining Opportunities

What experience do you already have working with textual data in a learning context, either as part of your research or part of your teaching or administrative practice in a learning space? How would you characterize your own expertise and focus in this area: Education, Psychology, Linguistics, or Computation? Have you made use of any tools for automated analysis in this past experience? In what ways would you like to extend the skills in this area you are brining with you? What kinds of expertise would you like to find in other students who you will interact with and collaborate with in this unit?

Starting from Communication

You’ll recall my professional experience began in graphic design. I started my career in print marketing. In this context my work involved starting with, what we called, ‘hamburger text’, and working with writers to develop the meta-language of sidebars, pull-quotes, tables, data-visualizations, illustrations and photography to deliver brand-consistent and professional-level marketing collateral. For a few years I developed presentation graphics for corporate clients and small business web sites for other designers. For the last six years my primary focus has involved research and writing in semiotic systems.

This work has lead to my interest in data analysis and text mining. I have followed DALMOOC’s instruction in areas of collecting, conditioning, and classifying data with great interest. Other MOOC courses I’ve taken include Michael Collins’ Natural Language Processing at Coursera. Collins presentation of ML concepts is a very clear introduction. An arguably more advanced course in Machine Learning algorithms is Yaser S. Abu-Mostafa’s Learning From Data at Edx. However, both courses focus entirely on mathematical analysis to the detriment of practical implementation (not even one snipped to Python’s NLTK to be found). DALMOOC has exceeded my expectations by providing both conceptual understanding and practical tools to apply course instruction and begin exploring Machine Learning problems right away.

How does Text Mining relate to your own goals?

The results of my investigations have lead only to more questions about Machine Learning (ML). I wish to find answers to questions such as,

  • In what ways is the problem of self-reference managed in ML?
  • How can we characterize the behavior of our algorithms to reflect our encounters with groups of texts (from sentence to paragraph to section to paper to corpus)
  • In what ways does ML identify and characterize systematic organization or structures?
  • In what ways can we, as researchers, update our models to reflect our changing understanding of our subjects of analysis? (similar to reader–text relationship)
  • In what ways can we develop ML to reflect different models of language analysis?

It’s unthinkable to manipulate links in Social Networks, but can we say the same of ML text analysis? A yearly New Year Holiday news feature are the reports from dictionary professionals about recent additions to prestigious dictionaries. An Open Source dictionary of a sort is the Princeton word database WordNet. It’s home page describes the project as a “network of meaningfully related words and concepts”. While language changes only very slowly, it nevertheless is under constant pressure to change by our everyday use. ML text interpretation involves the same difficulties we experience as human readers, ideas such the infinite recursion of language. The critique of literature offers many models for approaching text interpretation. How can we apply the Many Models approach to ML? George Box’s quote, “…all models are wrong, but some are useful.” [1] seems very appropriate in ML contexts.

Areas of Research where I hope to apply Machine Learning techniques

One of my ongoing projects involves personal research for the purpose of improving domain knowledge and writing [2]. The real work of design occurs from experiencing the goals & dilemmas of people. Design is one pathway to innovation, and it occurs when the design process reveals an opportunity to make improvements in the lives of others. Some areas of my personal interest include:

  • employment precarity
  • food insecurity & access to healthy foods
  • small business areas including:
    • branding systems
    • developing channels for your products
    • accounting

My Technological Ecosystem

I’ve explained my communications background, my interests and ideas in ML, as well as topics of research where I have devoted my free time [3]. To speak of making a significant contribution in any of these areas it’s time to get technical. For visual design in print and desktop publishing I use Adobe’s Creative Suite on an Apple computer.

For data projects I use a Lenovo laptop running the rolling distribution Arch Linux and an IBM RAID Server running the Enterprise Linux distribution CentOS 6.5. I purchased, installed, configured and continue to maintain both of these machines. I strive to know enough to use Python, PHP/MySQL and HTML/CSS. When I need to code I use VIM for programming and GIT for version management.

For documenting my work and creating reports I prefer iPython and libraries such as MatLab to R, because I can–I’m not tied by convention. Python has a stable and large community of users and developers. It’s well documented. There are many support resources available. By preferring Python I’ve also reduced the number of scripting languages I need to learn–for the time being.

Ultimately I approach problems from a Design Perspective

To understand the design perspective it’s helpful to realize these three ideas. First, as I mentioned earlier, a designer’s primary focus is to develop ideas to improve human experience. Second, working with designers you should expect a strong contribution of visualizations–expressions using multiple sign systems, non-linear texts, and representations. Third, design culture is project-based. Similar to an artist series, or a scientific experiment, the design cycle is defined by projects. These projects start with goals focused on users or an audience. Typical phases of the design process include investigation, conceptualization, iteration, feedback, and presentation. A lot more can be said. There are always variations, but it’s safe to say you can expect these consistent features in the design perspective.

As a contrast let me conclude with this quote from a Tableau webinar. IT Business Intelligence Manager Neil Randall describes the challenge of ‘context switches‘ developing business information projects at Solent University in the UK:

“IT is responsible for providing the Business Information Infrastructure. We’re not responsible for understanding every little business rule…one of the issues…with the first generation…Data Warehousing tools and BI tools. Because IT had to build everything, in every project, IT had to port every bit of business knowledge from the business in order to build the…models in the right way. Hugely demanding. You’re talking about massive context switches every few months with the IT staff having to take on Planning, Admissions…[right now] I’m trying to get my head around Clearing.”(17:40) [1]

The activities Neil describes developing IT solutions are precisely those of the design perspective.


[1] Box, George. Empirical Model-Building and Response Surfaces (1987) [Wikiquote]
[2] Tableau Software Webinar with Neil Randall, Business Intelligence Manager, Southampton Solent University. “Higher Education Gets Smarter When IT Empowers Everyone with Data”
[3] From 9/13 – 4/14 I volunteered with City Harvest of NYC and Cornell University to research produce purchasing in NYC supermarkets. Participating in this study involves on-site visits & face-to-face interviews with supermarket customers asking about their produce purchasing habits. The supermarkets in this study were located throughout the five boroughs. For my part my store was located in Bedford–Stuyvesant community of Brooklyn, NY.