Week 7: Reflecting on Text Mining Opportunities
Starting from Communication
You’ll recall my professional experience began in graphic design. I started my career in print marketing. In this context my work involved starting with, what we called, ‘hamburger text’, and working with writers to develop the meta-language of sidebars, pull-quotes, tables, data-visualizations, illustrations and photography to deliver brand-consistent and professional-level marketing collateral. For a few years I developed presentation graphics for corporate clients and small business web sites for other designers. For the last six years my primary focus has involved research and writing in semiotic systems.
This work has lead to my interest in data analysis and text mining. I have followed DALMOOC’s instruction in areas of collecting, conditioning, and classifying data with great interest. Other MOOC courses I’ve taken include Michael Collins’ Natural Language Processing at Coursera. Collins presentation of ML concepts is a very clear introduction. An arguably more advanced course in Machine Learning algorithms is Yaser S. Abu-Mostafa’s Learning From Data at Edx. However, both courses focus entirely on mathematical analysis to the detriment of practical implementation (not even one snipped to Python’s NLTK to be found). DALMOOC has exceeded my expectations by providing both conceptual understanding and practical tools to apply course instruction and begin exploring Machine Learning problems right away.
How does Text Mining relate to your own goals?
The results of my investigations have lead only to more questions about Machine Learning (ML). I wish to find answers to questions such as,
It’s unthinkable to manipulate links in Social Networks, but can we say the same of ML text analysis? A yearly New Year Holiday news feature are the reports from dictionary professionals about recent additions to prestigious dictionaries. An Open Source dictionary of a sort is the Princeton word database WordNet. It’s home page describes the project as a “network of meaningfully related words and concepts”. While language changes only very slowly, it nevertheless is under constant pressure to change by our everyday use. ML text interpretation involves the same difficulties we experience as human readers, ideas such the infinite recursion of language. The critique of literature offers many models for approaching text interpretation. How can we apply the Many Models approach to ML? George Box’s quote, “…all models are wrong, but some are useful.”  seems very appropriate in ML contexts.
Areas of Research where I hope to apply Machine Learning techniques
One of my ongoing projects involves personal research for the purpose of improving domain knowledge and writing . The real work of design occurs from experiencing the goals & dilemmas of people. Design is one pathway to innovation, and it occurs when the design process reveals an opportunity to make improvements in the lives of others. Some areas of my personal interest include:
My Technological Ecosystem
I’ve explained my communications background, my interests and ideas in ML, as well as topics of research where I have devoted my free time . To speak of making a significant contribution in any of these areas it’s time to get technical. For visual design in print and desktop publishing I use Adobe’s Creative Suite on an Apple computer.
For data projects I use a Lenovo laptop running the rolling distribution Arch Linux and an IBM RAID Server running the Enterprise Linux distribution CentOS 6.5. I purchased, installed, configured and continue to maintain both of these machines. I strive to know enough to use Python, PHP/MySQL and HTML/CSS. When I need to code I use VIM for programming and GIT for version management.
For documenting my work and creating reports I prefer iPython and libraries such as MatLab to R, because I can–I’m not tied by convention. Python has a stable and large community of users and developers. It’s well documented. There are many support resources available. By preferring Python I’ve also reduced the number of scripting languages I need to learn–for the time being.
Ultimately I approach problems from a Design Perspective
To understand the design perspective it’s helpful to realize these three ideas. First, as I mentioned earlier, a designer’s primary focus is to develop ideas to improve human experience. Second, working with designers you should expect a strong contribution of visualizations–expressions using multiple sign systems, non-linear texts, and representations. Third, design culture is project-based. Similar to an artist series, or a scientific experiment, the design cycle is defined by projects. These projects start with goals focused on users or an audience. Typical phases of the design process include investigation, conceptualization, iteration, feedback, and presentation. A lot more can be said. There are always variations, but it’s safe to say you can expect these consistent features in the design perspective.
As a contrast let me conclude with this quote from a Tableau webinar. IT Business Intelligence Manager Neil Randall describes the challenge of ‘context switches‘ developing business information projects at Solent University in the UK:
The activities Neil describes developing IT solutions are precisely those of the design perspective.
 Box, George. Empirical Model-Building and Response Surfaces (1987) [Wikiquote]
 Tableau Software Webinar with Neil Randall, Business Intelligence Manager, Southampton Solent University. “Higher Education Gets Smarter When IT Empowers Everyone with Data”
 From 9/13 – 4/14 I volunteered with City Harvest of NYC and Cornell University to research produce purchasing in NYC supermarkets. Participating in this study involves on-site visits & face-to-face interviews with supermarket customers asking about their produce purchasing habits. The supermarkets in this study were located throughout the five boroughs. For my part my store was located in Bedford–Stuyvesant community of Brooklyn, NY.