Data scientist is now officially the hip cool job. I remember laughing at the scene a couple seasons ago in House of Cards where thegenius data scientist stood shirtless in a soundproof room, blared music andscreamed as he came up with deep insights to help the Underwoods use data to win the election.
The reality is a bit more prosaic. Surveys show most data scientists spend most of their time cleaning and preparing data. They aren’t data scientists. They are data janitors. Of course a lot of science is like that (talk to anyone who has worked in a lab.)
Anyway, I’ve been hanging around people doing big data for over a decade now. I don’t write code and can’t do much math. So what am I?
There was a terrific article from Deloitte about the light quant. That is someone who knew enough data science and was a skilled communicator who could translate between the data science time and the C-suite decision-makers.
But I play a much broader role in the process than communicating results (not that that isn’t really important!) I help conceptualize the project. What are the problems people are trying to solve? This is sometimes not obvious. People don’t know what they don’t know. In discussing the surface problem, bigger problems can emerge.
I think a lot about the format and nature of the data. In many, many cases the really interesting information is unstructured or qualitative. What does it really mean and how do we best incorporate it? In some projects I have played a key role in collecting the data, but in the case of terrorism research much of the information is narrative – how do we meaningfully describe this numerically.
Then of course, when we do have results, we have to describe them and balance them against what is already known or believed.
We understand the world through stories. In addressing abstract problems, I often find myself asking for examples. At the core of the process I have described are stories. There is the story of the client. What are they saying about their workflow and challenges?
On so many interesting problems, the challenge is turning a story with fascinating details into a number or group of numbers. In my decade at UMIACS, I worked on projects modeling terrorist groups. Some things, like terrorist attacks were relatively easy to turn into numerical data (how many killed, codes for targets etc.) Other things, like information on the terrorist group’s public statements or their internal dynamics, were a bit more challenging to quantify. It is worth noting that computer scientists (and data scientists) tend to be interested in type. The analysts and SMEs tend to be interested in instance. The instance is a story, how to categorize the instance to a type without losing too much critical information is a hard challenge. The decisions about how this is done will shape the results – that too is part of the story.
When there are results, they must be considered in light of what is already known. What stories do we tell ourselves about this issue? How does the analysis inform these stories? Does it upend them, modify them, or confirm them. How confident can we be in the new findings? What are the broader organizational impacts of these findings for the client?
Making policies incorporate a combination of facts and values. Values are expressed through stories. Relying only on cold hard facts will not result in acceptable policies – values have to be part of the equation.
Answering any and all of these questions requires listening to and telling stories.
I guess I am a Data Storyteller.