Evidence Soup
How to find, use, and explain evidence.

Tuesday, 22 May 2018

Debiasing your company. And, placebos for health apps?

Samuel -Zeller photo on unsplash

Debiasing is hard work, requiring transparency, honest communication - and occasional stomach upset. But it gets easier and can become a habit, especially if people have a systematic way of checking their decisions for bias. In a recent podcast, Nobel-winning Richard Thaler explains several practical ways to debias decisions.

First, know your process. "You can imagine all kinds of good decisions taken in 2005 were evaluated five years later as stupid. They weren’t stupid. They were unlucky. So any company that can learn to distinguish between bad decisions and bad outcomes has a leg up."

A must-have for repeatable, high-quality decision processes. Under "Write Stuff Down," Thaler encourages teams to avoid hindsight bias by memorializing their assumptions and evaluation of a decision - preventing someone from later claiming "I never liked that idea." (I can't help but wonder if this write-it-down approach would make the best, or possibly the worst, marital advice ever. But I digress.)

“Any company that can learn to distinguish between bad decisions and bad outcomes has a leg up.”

Choice architecture relates to debiasing. It "can apply in any company. How are we framing the options for people? How is that influencing the choices that they make?" But to architect a set of effective choices - for ourselves or others - we must first correct for our own cognitive biases, and truly understand how/why people choose.

Diversity of hiring + diversity of thought = Better decisions. “[S]trong leaders, who are self-confident and secure, who are comfortable in their skin and their place, will welcome alternative points of view. The insecure ones won’t, and it’s a recipe for disaster. You want to be in an organization where somebody will tell the boss before the boss is about to do something stupid.” [Podcast transcript: McKinsey Quarterly, May 2018]

margin: 25px 0px 10px 0px;

Can we rigorously evaluate health app evidence? In Slate, Jessica Lipschitz and John Touros argue that we need a sugar-pill equivalent for digital health. “Without placebos, we can’t know the actual impact of a [health app] because we have not controlled for the known impact of user expectations on outcomes. This is well illustrated in a recent review of 18 randomized controlled trials evaluating the effectiveness of smartphone apps for depression.”

When an app was compared with a ‘waitlist’ control condition, the app seemed to reduce depressive symptoms. But when these findings were compared to active controls — such as journaling — the comparative effectiveness of the smartphone apps fell 61%. “This is consistent with what would be expected based on the placebo effect.” Via David Napoli's informative Nuzzel newsletter. [Slate Magazine: Why It's So Hard to Figure Out Whether Health Apps Work]


Photo credits: Samuel Zeller and Thought Catalog on Unsplash

 

Monday, 14 May 2018

Building a repeatable, evidence-based decision process.

Decision-making matrix by Tomasz Tunguz
How we decide is no less important than the evidence we use to decide. People are recognizing this and creating innovative ways to blend what, why, and how into decision processes.

1. Quality decision process → Predictable outcomes
After the Golden Rule, perhaps the most important management lesson is learning to evaluate the quality of a decision process separately from the outcome. Tomasz Tunguz (@ttunguz) reminds us in a great post about Annie Duke, a professional poker player: “Don’t be so hard on yourself when things go badly and don’t be so proud of yourself when they go well.... The wisdom in Duke’s advice is to focus on the process, because eventually the right process will lead to great outcomes.”

Example of a misguided “I'm feeling lucky” response: Running a crummy oil company, and thinking you‘re a genius even though profits arise from unexpected, $80/barrel prices.

2. Reinvent the meeting → Better decisions
Step back and examine your meeting style. Are you giving all the evidence a voice, or relying on the same old presentation theater? Kahlil Smith writes in strategy+business (@stratandbiz) “If I catch myself agreeing with everything a dominant, charismatic person is saying in a meeting, then I will privately ask a third person (not the presenter or the loudest person) to repeat the information, shortly after the meeting, to see if I still agree.” Other techniques include submitting ideas anonymously, considering multiple solutions and scenarios, and a decision pre-mortem with a diverse group of thinkers. More in Why Our Brains Fall for False Expertise, and How to Stop It.

3. How to Teach and Apply Evidence-Based Management. The Center for Evidence-Based Management (CEBMa) Annual Meeting is scheduled for August 9 in Chicago. There's no fee to attend.

Tuesday, 13 March 2018

Biased instructor response → Students shut out

Benjamin-dada-323461-unsplash

Definitely not awesome. Stanford’s Center for Education Policy Analysis reports Bias in Online Classes: Evidence from a Field Experiment. “We find that instructors are 94% more likely to respond to forum posts by white male students. In contrast, we do not find general evidence of biases in student responses…. We discuss the implications of our findings for our understanding of social identity dynamics in classrooms and the design of equitable online learning environments.”

“Genius is evenly distributed by zip code. Opportunity and access are not.” -Mitch Kapor

One simple solution – sometimes deployed for decision debiasing – is to make interactions anonymous. However, applying nudge concepts, a “more sophisticated approach would be to structure online environments that guide instructors to engage with students in more equitable ways (e.g., dashboards that provide real-time feedback on the characteristics of their course engagement).”

Prescribe antidepressants → Treat major depression

Metaanalysis-lancetAn impressive network meta-analysis – comparing drug effects across numerous studies – shows “All antidepressants were more efficacious than placebo in adults with major depressive disorder. Smaller differences between active drugs were found when placebo-controlled trials were included in the analysis…. These results should serve evidence-based practice and inform patients, physicians, guideline developers, and policy makers on the relative merits of the different antidepressants.” Findings are in the Lancet.

Thursday, 08 March 2018

Redefining ‘good data science’ to include communication.

Data science revised skillset on VentureBeat by Emma Walker

Emma Walker explains on VentureBeat The one critical skill many data scientists are missing. She describes the challenge of working with product people, sales teams, and customers: Her experience made her “appreciate how vital communication is as a data scientist. I can learn about as many algorithms or cool new tools as I want, but if I can’t explain why I might want to use them to anyone, then it’s a complete waste of my time and theirs.”

After school, “you go from a situation where you are surrounded by peers who are also experts in your field, or who you can easily assume have a reasonable background and can keep up with you, to a situation where you might be the only expert in the room and expected to explain complex topics to those with little or no scientific background.... As a new data scientist, or even a more experienced one, how are you supposed to predict what those strange creatures in sales or marketing might want to know? Even more importantly, how do you interact with external clients, whose logic and thought processes may not match your own?”

How do you interact with external clients, whose logic and thought processes may not match your own?

Sounds like the typical “no-brainer”: Obvious in retrospect. Walker reminds us of the now-classic diagram by Drew Conway illustrating the skill groups you need to be a data scientist. However, something is “missing from this picture — a vital skill that comes in many forms and needs constant practice and adaption to the situation at hand: communication. This isn’t just a ‘soft’ or ‘secondary’ skill that’s nice to have. It’s a must-have for good data scientists.” And, I would add, good professionals of every stripe.

Tuesday, 06 March 2018

Biased evidence skews poverty policy.

Decision bias: food-desert map

In Biased Ways We Look at Poverty, Adam Ozimek reviews new evidence suggesting that food deserts aren’t the problem, behavior is. His Modeled Behavior (Forbes) piece asks why the food desert theory got so much play, claiming “I would argue it reflects liberal bias when it comes to understanding poverty.”

So it seems this poverty-diet debate is about linking cause with effect - always dangerous, bias-prone territory. And citizen-data scientists, academics, and everyone in between are at risk of mapping objective data (food store availability vs. income) and subjectively attributing a cause for poor habits.

The study shows very convincingly that the difference in healthy eating is about behavior and demand, not supply.

Ozimek looks at the study The Geography of Poverty and Nutrition: Food Deserts and Food Choices Across the United States, published by the National Bureau of Economic Research. The authors found that differences in healthy eating aren’t explained by prices, concluding that “after excluding fresh produce, healthy foods are actually about eight percent less expensive than unhealthy foods.” Also, people who moved from food deserts to locations with better options continued to make similar dietary choices.

Food for thought, indeed. Rather than following behavioral explanations, Ozimek believes liberal thinking supported the food desert concept “because supply-side differences are more complimentary to poor people, and liberals are biased towards theories of poverty that are complimentary to those in poverty.” Meanwhile, conservatives “are biased towards viewing the behavioral and cultural factors that cause poverty as something that we can’t do anything about.”

Thursday, 01 March 2018

Why don't Executives trust analytics?

Boston-dynamics-spot-mini

Last year I spoke with the CEO of a smallish healthcare firm. He had not embraced sophisticated analytics or machine-made decision making, with no comfort level for ‘what information he could believe’. He did, however, trust the CFO’s recommendations. Evidently, these sentiments are widely shared.

A new KPMG report reveals a substantial digital trust gap inside organizations: “Just 35% of IT decision-makers have a high level of trust in their organization’s analytics”.

Blended decisions by human and machine are forcing managers to ask Who is responsible when analytics go wrong? Of surveyed executives, 19% said the CIO, 13% said the Chief Data Officer, and 7% said C-level executive decision makers. “Our survey of senior executives is telling us that there is a tendency to absolve the core business for decisions made with machines,” said Brad Fisher, US Data & Analytics Leader with KPMG in the US. “This is understandable given technology’s legacy as a support service.... However, it’s our view that many IT professionals do not have the domain knowledge or the overall capacity required to ensure trust in D&A [data and analytics]. We believe the responsibility lies with the C-suite.... The governance of machines must become a core part of governance for the whole organization.”

Tuesday, 06 February 2018

Now cognitive bias is poisoning our algorithms.

Tversky-kahneman-altman-PWLtalk2018-cover-1-476x476

Can we humans better recognize our cognitive biases before we turn the machines loose, fully automating them? Here’s a sample of recent caveats about decision-making fails: While improving some lives, we’re making others worse.

Yikes. From HBR, Hiring algorithms are not neutral. If you set up your resume-screening algorithm to duplicate a particular employee or team, you’re probably breaking the rules of ethics and the law, too. Our biases are well established, yet we continue to repeat our mistakes.

Amos Tversky and Daniel Kahneman brilliantly challenged traditional economic theory while producing evidence of our decision bias. Recently I gave a Papers We Love talk on behavioral economics and bias in software design. T&K’s early research famously identified three key, potentially flawed heuristics (mental shortcuts) commonly employed for decision-making: Representativeness, availability, and anchoring/adjustment. The implications for today’s software development must not be overlooked.

Algorithms might be making the poor even less equal. In Automating Inequality, Virginia Eubanks argues that the poor “are the testing ground for new technology that increases inequality.” She argues that our “moralistic view of poverty... has been wrapped into today‘s automated and predictive decision-making tools. These algorithms can make it harder for people to get services while forcing them to deal with an invasive process of personal data collection. As examples, she profiles a Medicaid application process in Indiana, homeless services in Los Angeles, and child protective services in Pittsburgh.”

Prison-sentencing algorithms are also feeling some heat. “Imagine you’re a judge, and you have a commercial piece of software that says we have big data, and it says this person is high risk...now imagine I tell you I asked 10 people online the same question, and this is what they said. You’d weigh those things differently.” [Wired article] Dartmouth researchers claim that a popular risk-assessment algorithm predicts recidivism about as well as a random online poll. Science Friday also covered similar issues with crime sentencing algorithms.

Wednesday, 09 August 2017

How evidence can guide, not replace, human decisions.

Bad Choices book cover

1. Underwriters + algorithms = Best of both worlds.
We hear so much about machine automation replacing humans. But several promising applications are designed to supplement complex human knowledge and guide decisions, not replace them: Think primary care physicians, policy makers, or underwriters. Leslie Scism writes in the Wall Street Journal that AIG “pairs its models with its underwriters. The approach reflects the company’s belief that human judgment is still needed in sizing up most of the midsize to large businesses that it insures.” See Insurance: Where Humans Still Rule Over Machines [paywall] or the podcast Insurance Rates Set by ... Machine Intelligence?

Who wants to be called a flat liner? Does this setup compel people to make changes to algorithmic findings - necessary or not - so their value/contributions are visible? Scism says “AIG even has a nickname for underwriters who keep the same price as the model every time: ‘flat liners.’” This observation is consistent with research we covered last week, showing that people are more comfortable with algorithms they can tweak to reflect their own methods.

AIG “analysts and executives say algorithms work well for standardized policies, such as for homes, cars and small businesses. Data scientists can feed millions of claims into computers to find patterns, and the risks are similar enough that a premium rate spit out by the model can be trusted.” On the human side, analytics teams work with AIG decision makers to foster more methodical, evidence-based decision making, as described in the excellent Harvard Business Review piece How AIG Moved Toward Evidence-Based Decision Making.


2. Another gem from Ali Almossawi.
An Illustrated Book of Bad Arguments was a grass-roots project that blossomed into a stellar book about logical fallacy and barriers to successful, evidence-based decisions. Now Ali Almossawi brings us Bad Choices: How Algorithms Can Help You Think Smarter and Live Happier.

It’s a superb example of explaining complex concepts in simple language. For instance, Chapter 7 on ‘Update that Status’ discusses how crafting a succinct Tweet draws on ideas from data compression. Granted, not everyone wants to understand algorithms - but Bad Choices illustrates useful ways to think methodically, and sort through evidence to solve problems more creatively. From the publisher: “With Bad Choices, Ali Almossawi presents twelve scenes from everyday life that help demonstrate and demystify the fundamental algorithms that drive computer science, bringing these seemingly elusive concepts into the understandable realms of the everyday.”


3. Value guidelines adjusted for novel treatment of rare disease.
Like it or not, oftentimes the assigned “value” of a health treatment depends on how much it costs, compared to how much benefit it provides. Healthcare, time, and money are scarce resources, and payers must balance effectiveness, ethics, and equity.

Guidelines for assessing value are useful when comparing alternative treatments for common diseases. But they fail when considering an emerging treatment or a small patient population suffering from a rare condition. ICER, the Institute for Clinical and Economic Review, has developed a value assessment framework that’s being widely adopted. However, acknowledging the need for more flexibility, ICER has proposed a Value Assessment Framework for Treatments That Represent a Potential Major Advance for Serious Ultra-Rare Conditions.

In a request for comments, ICER recognizes the challenges of generating evidence for rare treatments, including the difficulty of conducting randomized controlled trials, and the need to validate surrogate outcome measures. “They intend to calculate a value-based price benchmark for these treatments using the standard range from $100,000 to $150,000 per QALY [quality adjusted life year], but will [acknowledge] that decision-makers... often give special weighting to other benefits and to contextual considerations that lead to coverage and funding decisions at higher prices, and thus higher cost-effectiveness ratios, than applied to decisions about other treatments.”

Monday, 31 July 2017

Resistance to algorithms, evidence for home visits, and problems with wearables.

Kitty with laptop

I'm back, after time away from the keyboard. Yikes! Evidence is facing an uphill battle. Decision makers still resist handing control to others, even when new methods or machines make better predictions. And government agencies continue to, ahem, struggle with making evidence-based policy.  — Tracy Altman


1. Evidence-based home visit program loses funding.
The evidence base has developed over 30+ years: Advocates for home visit programs - where professionals visit at-risk families - cite immediate and long-term benefits for parents and for children. Things like positive health-related behavior, fewer arrests, community ties, lower substance abuse [Long-term Effects of Nurse Home Visitation on Children's Criminal and Antisocial Behavior: 15-Year Follow-up of a Randomized Controlled Trial (JAMA, 1998)]. Or Nobel Laureate-led findings that "Every dollar spent on high-quality, birth-to-five programs for disadvantaged children delivers a 13% per annum return on investment" [Research Summary: The Lifecycle Benefits of an Influential Early Childhood Program (2016)].

The Nurse-Family Partnership (@NFP_nursefamily), a well-known provider of home visit programs, is getting the word out in the New York Times and on NPR.

AEI_funnel_27jul17

Yet this bipartisan, evidence-based policy is now defunded. @Jyebreck explains that advocates are “staring down a Sept. 30 deadline.... The Maternal, Infant and Early Childhood Home Visiting program, or MIECHV, supports paying for trained counselors or medical professionals” where they establish long-term relationships.

It’s worth noting that the evidence on childhood programs is often conflated. AEI’s Katharine Stevens and Elizabeth English break it down in their excellent, deep-dive report Does Pre-K Work? They illustrate the dangers of drawing sweeping conclusions about research findings, especially when mixing studies about infants with studies of three- or four-year olds. And home visit advocates emphasize that disadvantage begins in utero and infancy, making a standard pre-K program inherently inadequate. This issue is complex, and Congress’ defunding decision will only hurt efforts to gather evidence about how best to level the playing field for children.

AEI Does Pre-K Work

2. Why do people reject algorithms?
Researchers want to understand our ‘irrational’ responses to algorithmic findings. Why do we resist change, despite evidence that a machine can reliably beat human judgment? Berkeley J. Dietvorst (great name, wasn’t he in Hunger Games?) comments in the MIT Sloan Management Review that “What I find so interesting is that it’s not limited to comparing human and algorithmic judgment; it’s my current method versus a new method, irrelevant of whether that new method is human or technology.”

Job-security concerns might help explain this reluctance. And Dietvorst has studied another cause: We lose trust in an algorithm when we see its imperfections. This hesitation extends to cases where an ‘imperfect’ algorithm remains demonstrably capable of outpredicting us. On the bright side, he found that “people were substantially more willing to use algorithms when they could tweak them, even if just a tiny amount”. Dietvorst is inspired by the work of Robyn Dawes, a pioneering behavioral decision scientist who investigated the Man vs. Machine dilemma. Dawes famously developed a simple model for predicting how students will rank against one another, which significantly outperformed admissions officers. Yet both then and now, humans don’t like to let go of the wheel.

Wearables Graveyard by Aaron Parecki

3. Massive data still does not equal evidence.
For those who doubted the viability of consumer health wearables and the notion of the quantified self, there’s plenty of validation: Jawbone liquidated, Intel dropped out, and Fitbit struggles. People need a compelling reason to wear one (such as fitness coach, or condition diagnosis and treatment).

Rather than a data stream, we need hard evidence about something actionable: Evidence is “the available body of facts or information indicating whether a belief or proposition is true or valid (Google: define evidence).” To be sure, some consumers enjoy wearing a device that tracks sleep patterns or spots out-of-normal-range values - but that market is proving to be limited.

But Rock Health points to positive developments, too. Some wearables demonstrate specific value: Clinical use cases are emerging, including assistance for the blind.

Photo credit: Kitty on Laptop by Ryan Forsythe, CC BY-SA 2.0 via Wikimedia Commons.
Photo credit: Wearables Graveyard by Aaron Parecki on Flickr.