Data scientist is now officially the hip cool job. I
remember laughing at the scene a couple seasons ago in House of Cards where thegenius data scientist stood shirtless in a soundproof room, blared music andscreamed as he came up with deep insights to help the Underwoods use data to
win the election.
The reality is a bit more prosaic. Surveys show most data
scientists spend most of their time cleaning and preparing data. They aren’t
data scientists. They are data janitors. Of course a lot of science is like
that (talk to anyone who has worked in a lab.)
Anyway, I’ve been hanging around people doing big data for
over a decade now. I don’t write code and can’t do much math. So what am I?
There was a terrific article from Deloitte about the light quant. That is someone who knew enough data science and was a skilled
communicator who could translate between the data science time and the C-suite decision-makers.
But I play a much broader role in the process than
communicating results (not that that isn’t really important!) I help
conceptualize the project. What are the problems people are trying to solve?
This is sometimes not obvious. People don’t know what they don’t know. In
discussing the surface problem, bigger problems can emerge.
I think a lot about the format and nature of the data. In
many, many cases the really interesting information is unstructured or
qualitative. What does it really mean and how do we best incorporate it? In
some projects I have played a key role in collecting the data, but in the case
of terrorism research much of the information is narrative – how do we meaningfully
describe this numerically.
Then of course, when we do have results, we have to describe
them and balance them against what is already known or believed.
We understand the world through stories. In addressing
abstract problems, I often find myself asking for examples. At the core of the
process I have described are stories. There is the story of the client. What
are they saying about their workflow and challenges?
On so many interesting problems, the challenge is turning a
story with fascinating details into a number or group of numbers. In my decade
at UMIACS, I worked on projects modeling terrorist groups. Some things, like
terrorist attacks were relatively easy to turn into numerical data (how many
killed, codes for targets etc.) Other things, like information on the terrorist
group’s public statements or their internal dynamics, were a bit more
challenging to quantify. It is worth noting that computer scientists (and data
scientists) tend to be interested in type. The analysts and SMEs tend to be
interested in instance. The instance is a story, how to categorize the instance
to a type without losing too much critical information is a hard challenge. The
decisions about how this is done will shape the results – that too is part of
the story.
When there are results, they must be considered in light of
what is already known. What stories do we tell ourselves about this issue? How
does the analysis inform these stories? Does it upend them, modify them, or
confirm them. How confident can we be in the new findings? What are the broader
organizational impacts of these findings for the client?
Making policies incorporate a combination of facts and
values. Values are expressed through stories. Relying only on cold hard facts
will not result in acceptable policies – values have to be part of the equation.
Answering any and all of these questions requires listening
to and telling stories.
I guess I am a Data Storyteller.
No comments:
Post a Comment