If I take a hammer to my laptop to see if it survives a few taps, does that make me a physicist?
My friend works with ghost hearts.
A pig's heart, washed translucent with detergent, is injected into a patient's stem cells, enabling the ghost heart to be transplanted into a patient without their body rejecting the foreign organ — at least theoretically. A brief Google search tells me the technique hasn't worked in practice yet, but the fact that we're even at the point where we can bioengineer hearts blows my mind. Science like this makes my work feel miniscule. As far as I'm concerned, ghost hearts, if they work, are nothing short of a miracle.
"Yeah, but I wish I knew more women scientists," my friend tells me over beers. "So I can talk about my work without someone hitting on me or worried that I'm hitting on them." She's new in town, and the struggle to make friends as a working adult is real.
"I work in science…kind of," I say, with beer-backed confidence. After all, I'd recently completed my certification in an A/B testing tool and was running experiments to see which landing page variation would perform better. It wasn't saving lives, sure, but it was empirical, right? Technically, when push comes to shove, I'm a woman in STEM.
She stared at me point-blank and deadpanned. "You work in math."
I took another sip and nodded humbly. She was right.
The other, more questionable scientific method
Although I was now "certified" to run "content experiments," and had to answer questions about "hypotheses" and "variables" on the company's certification test, the whole kit and caboodle of certification and experimentation was another complex marketing technique concocted to sell software.
That conversation was the first time it registered that "computer science," once you get past the hardware and exit the university physics department, doesn't much resemble science, at least the kind that directly relates to life on Earth. "Data science" and the 12-week professional certificate programs that supplement its popularity are even less scientific.
From my perspective, learning to code is more like learning another language. Or attending logic class (historically stationed in the philosophy department). Teasing out the organic structures of proteins in a laboratory environment is something else entirely. Understanding predictive modeling can help you play moneyball temporarily, but it doesn’t scale as anticipated. And the scientific method — the process of isolating a variable in a controlled setting to prove or disprove a hypothesis — completely falls apart when applied to the chaos of the internet.
But it’s called science because, well, STEM gets funded. Humanities do not. I can attest that when I shifted my career from pure creative production to the discipline known as digital strategy, I made significantly more money and had far more job security.
What are data science and predictive analytics?
Because I work in digital strategy, intelligent people on the creative production side occasionally imply that my work is somewhat scientific. When I collaborate with editorial professionals, some say, "You're the science and I'm the art!" I always gently correct them: We are neither. We are in business.
Technically, my work in digital research relates to the practice of data science, the mostly 21st-century discipline of using statistics to understand human behavior via unstructured data. "Data science" is the university-enabled brand for what's more accurately known as "predictive analytics," or the idea that we can calculate ourselves into knowing the future. You can read the Wikipedia, but the gist behind common applications of data science is that if you look at enough data of how people have behaved in the past, you can predict how most people will behave in most situations.
Predictive analytics rests on the assumption that humans will act predictably in every situation, whether moved by logic or emotion. If we can simply measure all the behaviors, data science says, we can statistically determine what humans will do next. It maths aside the concept of free will, resembles the Calvinist concept of predetermination, and deeply contradicts my own lived experience, which is that whether in business or in life, people rarely act as one expects.*
The assumptions of data science work most convincingly in homogeneous situations and populations. Data science can predict what will likely work for most people in the population of “everybody online,” but not every individual. From personalization to autocorrect to recommender systems, most calculates, quite literally, the lowest common denominator of online behavior.
The assumptions of data science — that most people who do one thing are likely to do another — replicate what’s called the Hypodermic Needle theory in communication studies.
Similar to the concept of technological determinism, the Hypodermic Needle theory assumes that audiences have no agency or critical thinking, that if they see a message on television, they will believe it. To some extent, it's true. We know that advertising sells products when you blast an audience with repeated messaging. We know that people who only consume one source of media tend to parrot that media’s point of view.
But as an anti-authority contrarian who rarely sees her perspective reflected in mainstream media, the Hypodermic Needle has never held much weight with me, especially when I learned the intricacies of polling, quant/qual research methods, and communications theories in graduate school. (Bring on the Raymond Williams!)
I'm more of a Uses and Gratifications girl: people use media in a variety of ways to satisfy different needs. We do not all operate on the same wiring. Accommodating for diversity of thought and behavioral preferences is not only ethical but also necessary in 21st-century business.
*It's also helpful to understand that predictive analytics was popularized in mass culture by the midcentury science fiction writer Isaac Asimov. Asimov’s fans include many tech industry bigwigs who, having learned that Jules Verne predicted submarines, saw an economic opportunity and placed business bets on another sci-fi writer. I am being facetious and glib here, but while Asimov may have been a biochemistry professor, it’s important to remember that he wrote fiction. The Foundation series has as much to do with the material circumstances of the real world as The Devil Wears Prada and The Pelican Brief, two best-sellers also based on professional experience.
When 20th-century methodologies fall apart
Inconsistencies like cultural nuances, value differences, and uneven technological adoption make predictive analytics far less effective. Accurately sampling diverse populations remains extremely difficult. Most existing media measurement systems account only for demographic variations, not behavioral differences. The shift from demographic to behavioral profiling has been a challenge for advertising, data science, and mass media publishing alike.
For example, electoral polling was considered accurate in the 20th century when everyone had a phone line, watched the same three major network TV stations, and read the same local paper. Information production was codified in the news publishing industry’s relatively homogeneous adoption of journalism ethics. Niche media might have influenced a few people, but most information about political candidates came from the same places and could be accessed the same way. Polling scientists could make reasonable assumptions about people based only on demographic and class differences, and technological adoption didn't matter.
With the introduction of cable networks, then cell phones, and finally social media, people bypassed landlines entirely and had a reasonable alternative to watching TV. While some shifts in media consumption correlated to demographic and class categories, it made the dataset much less predictable.
There could be many reasons for this change. Pollsters couldn't access all populations via landline. The general population's willingness to participate in polling was marred by the feeling that posting online had the same impact of answering a survey call. And pollsters considered race, class, age, and gender to be the only indicators of diversity. The homogeneity of 20th-century institutions never accounted for diversity of behavior introduced by multiple media sources and rapid, unprecedented technological adoption.
That’s why polling in the 2016 U.S. election was so far off from the actual election results. People with landlines or listed phone numbers were over-indexed in the samples. Media sources who knew polling used to be accurate were convinced people would behave the same way in 2016 as they had in 1996 or even 2008. Online audiences were represented but their behaviors were misunderstood. Not only had the demographics shifted, but device behavior and media consumption had so radically changed, the old sampling methods rooted in media stability were no longer predictable.
**That’s also why I’m a big fan of basing content and media buying strategies on online behavioral patterns and contextual advertising, instead of simple demographic targeting, which is usually stereotypical, if not flat-out racist.
Is it data science or is it software marketing?
Despite many instances of misapplication of data science across culture, the discipline’s positivist believers still shout from the rooftops of progress and profits. For the past decade, I've watched digital strategists declare the winners of a single marketing A/B test as if they've discovered the law of gravity. Even when the data show that 99.5% of people rejected both the A and B variations.
Many prominent SEO experts announce results from “empirical” tests on Google organic search algorithms as if they are discovering a life-saving synthetic protein. In reality, the search engine is just a Google product that is adjusted, often, by Google, whenever Google wants.
In the world of AI testing and critique, software developers and internet journalists purposely attempt to break the product and call it “science.” While I understand the motivations — ChatGPT has many flaws — I hesitate to call any software testing scientific. If I take a hammer to my laptop to see if it survives a few taps, does that make me a physicist?
And if the software changes all the time because of “agile” and “minimum viable product” business methodologies or because of “machine learning,” is the scientific method even remotely relevant?
Keep it small, scientists: When statistical modeling works for content recommendation
It's not that the entire practice of data science is snake oil — although I'd argue that we need to question the label of "science" along with the aggressive university marketing budgets and certificate programs that have powered its rise. Statistical modeling still has its place. Machine learning can find patterns and trends that humans never would.
But when it comes to digital strategy today, “informed guessing to affect business results” or "editorial judgment based on various inputs" is more accurate than “data science.”
In small populations with similar cultural affinities (B2B software buyers, movie fans, etc.), data science can be extremely accurate in determining which content or idea might resonate the most. If you have a critical mass of users who share similar behaviors, you can determine which movie they might like best (Netflix accurately being able to predict your movie taste on a scale of 1-100) or what content they might need to select an SEO software vendor (why account-based marketing resonates with B2B buyers).
As I’ve written before, content personalization works when businesses invest in a wide variety of content designed to accommodate the diversity of the human experience and consumer preferences. But results evaporate when we boil preferences down to a few limited types (say, OCEAN personality traits or Berkley, Burlington, and Cambridge) or optimize toward mass adoption (Netflix scaling down from 5-star ratings to thumbs up/thumbs down to encourage more users to rate).
When predictive analytics that originate in online behaviors are extrapolated to apply to much larger populations, and the people who are telling stories about the data science understand neither the mechanics of the theory nor the finer points of what they are dealing with, the applications get, well, fuckity. Enshittified. Frustratingly inaccurate for those of us who enjoyed the nuanced user experience that niche datasets accommodated.
The second half of this essay will be published in the newsletter on December 14 and on this website on December 15, 2023.