Evidence Soup
How to find, use, and explain evidence.

91 posts categorized "science & research methods"

Thursday, 17 October 2013

Got findings? Show us the value. And be specific about next steps, please.

Lately I've become annoyed with research, business reports, etc. that report findings without showing why they might matter, or what should be done next. Things like this: "The participants biological fathers’ chest hair had no significant effect on their preference for men with chest hair." [From Archives of Sexual Behavior, via Annals of Improbable Research.]

Does it pass the "so what" test? Not many of us write about chest hair. But we all need to keep our eyes on the prize when drawing conclusions about evidence. It's refreshing to see specific actions, supported by rationale, being recommended alongside research findings. As Exhibit A, I offer the PLOS Medicine article Use of Expert Panels to Define the Reference Standard in Diagnostic Research: A Systematic Review of Published Methods and Reporting (Bertens et al). Besides explaining how panel diagnosis has (or hasn't) worked well in the past, the authors recommend specific steps to take - and provide a checklist and flowchart. I'm not suggesting everyone could or should produce a checklist, flowchart, or cost-benefit analysis in every report, but more concrete Next Steps would be powerful.

PLOS Medicine: Panel Diagnosis research by Bertens et al

So many associations, so little time. We're living in a world where people need to move quickly. We need to be specific when we identify our "areas for future research". What problem can this help solve? Where is the potential value that could be confirmed by additional investigation? And why should we believe that?

Otherwise it's like simply saying "fund us some more, and we'll tell you more". We need to know exactly what should be done next, and why. I know basic research isn't supposed to work that way, but since basic research seems to be on life support, something needs to change. It's great to circulate an insight for discovery by others. But without offering a suggestion of how it can make the world a better place, it's exhausting for the rest of us.

Wednesday, 30 January 2013

A must-read: "Does the Language Fit the Evidence? Association Versus Causation."

Science is easy; explaining it is hard. Back in 2009, Evidence Soup recommended the excellent Health News Review, whose mission is to "hold health and medical journalism accountable" (more about them at the end of this post). They've published Tips for Understanding Studies (available for purchase here). One of their free online writeups is a must-read for anyone working with evidence. Yes, it's basic -- but depending on your level of experience, will be a valuable refresher, an intro to research methods, or a guide to science writing.

Does The Language Fit The Evidence? – Association Versus Causation was put together by Mark Zweig, MD, and Emily DeVoto, PhD, "two people who have thought a lot about how reporters cover medical research". (I've been acquainted with Emily DeVoto for several years; I like that her catchphrase is "The plural of anecdote is not data.").

Passive language vs. active voice. The authors describe how an 'association' can be inadvertently misconstrued as a cause/effect relationship:

"A subtle trap occurs in the transition from the cautious, nondirectional, noncausal, passive language that scientists use in reporting the results of observational studies to the active language favored in mass media.... For example, a description of an association (e.g., associated with reduced risk) can become, via a change to the active voice (reduces risk), an unwarranted description of cause and effect. There is a world of difference in meaning between saying 'A was associated with increased B' and saying 'A increased B.'" [emphasis is mine]

These are subtle things with tremendous importance. When I was in school, I often heard "Correlation doesn't mean causation." Evidently, the difference is still be a big hurdle both for experts and non-experts. Zweig and DeVoto illustrate how things can go awry in this helpful example:

Study design Prospective cohort study of dietary fat and age-related maculopathy (observational).
Researchers’ version of results A 40% reduction of incident early age-related maculopathy was associated with fish consumption at least once a week.
Journalist’s version of results Eating fish may help preserve eyesight in older people.
Problem Preserve and help are both active and causal; may help sounds like a caveat designed to convey uncertainty, but causality is still implied.
Suggested language “People who ate fish at least once a week were observed to have fewer cases of a certain type of eye problem. However, a true experimental randomized trial would be required in order to attribute this to their fish consumption, rather than to some other factor in their lives. This was an observational study – not a trial.”


Abstracts stick to the facts. The authors note that the "language in a scientific publication is carefully chosen for the conclusion in the abstract or in the text, but not used so strictly in the discussion section. Thus, borrowing language from scientific papers warrants caution."

You can follow @HealthNewsRevu on Twitter. The project is led by Gary Schwitzer and funded by the Foundation for Informed Medical Decision Making.

Wednesday, 07 November 2012

What counts as good evidence? Alliance for Useful Evidence offers food for thought.

"What counts as good evidence?" is a great conversation starter. The UK-based Alliance for Useful Evidence / Nesta are hosting a seminar Friday morning to "explore what is realistic in terms of standards of evidence for social policy, programmes and practice." Details: What is Good Evidence? Standards, Kitemarks, and Forms of Evidence, 9 November 2012, 9:30-11:30 (GMT), London. The event is chaired by Geoff Mulgan, CEO of Nesta ; speakers include Dr. Gillian Leng, Deputy Chief Executive for Health and Social Care, NICE; and Dr. Louise Morpeth, Co-Director Dartington Social Research Unit.

Alliance for useful evidence Prompting that discussion is a 'provocation paper', What Counts as Good Evidence?, by Sandra Nutley, Alison Powell, and Huw Davies. They're with the University of St Andrews Research Unit for Research Utilisation (RURU). Let me know if you want me to send you a copy (I'm tracy AT evidencesoup DOT com).

The evidence journey. This paper doesn't break lots of new ground, but it's a useful recap of the state of evidence-seeking from a policy / program standpoint. While the authors do touch on bottom-up evidence schemes, the focus here isn't on crowdsourced evidence (such as recent health tech efforts). I love how they describe the effort to establish a basis for public policy as the evidence journey. Some highlights:

Hierarchies are too simple. We know the simple Level I/II labeling schemes, identifying how evidence is collected, are useful but insufficient. The authors explain that "study design has long been used as a key marker for evidence quality, but such ‘hierarchies of evidence’ raise many issues and have remained contested. Extending the hierarchies so that they also consider the quality of study conduct or the use of underpinning theory have... exposed new fault-lines of debate.... [S]everal agencies and authors have developed more complex matrix approaches for identifying evidence quality in ways that are more closely linked to the wider range of policy or practice questions being addressed."

Research evidence is good stuff. But the authors remind us "there are other ways of knowing things.  One schema (Brechin & Siddell, 2000) highlights three different ways of knowing": empirical, theoretical, and experiential.

Do standards help? The authors provide a very nice list of evidence standards & rating schemes (GRADE, Top Tier Evidence, etc.) - that is reason enough to get your hands on a copy of the paper. And they note the scarcity of evidence on effectiveness of these rating schemes.

Incidentally, Davies and Nutley contributed to the 2000 book What Works? Evidence-Based Policy and Practice that I've always admired (Davies, Nutley and Smith, Policy Press).

Thursday, 01 December 2011

U.S. agencies aren't supposed to allow political interference with scientific evidence. Good luck with that.

Evidence Soup is back in business. These past 3 months, I've been distracted by a number of things, including a move from Denver, Colorado to the San Francisco Bay Area. So, where were we?

Will "4.37 Degrees of Separation" play at the multiplex? Awhile back I wrote about recent research to test the Six Degrees of Separation theory. New evidence suggests that people are separated by an average of 4.74 degrees (only 4.37 in the U.S.). Doesn't really roll off the tongue, and I wouldn't expect Will Smith to star in a sequel. But this latest research applies only to people on Facebook; a New York Times piece reminds us "the cohort was a self-selected group, in this case people with online access who use a particular Web site".

Desperately seeking scientific integrity. Early on, President Obama launched an effort to ensure that the public can "trust the science and scientific process informing public policy decisions". His March 9, 2009 memorandum made agencies responsible for "the highest level of integrity in all aspects of the executive branch's involvement". Great idea to de-politicize science, though it's much easier said than done -- as evidenced by the nearly-two-year delay in providing "guidelines" for agencies issuing this policy. Really. Those guidelines were published December 17, 2010 [pdf here].

Oh, and meanwhile the Obama Administration was widely criticized for using sloppy science to support its moratorium on offshore drilling after the BP disaster on the Gulf Coast. It's never simple: In real life it requires weighing risks, factoring in economic impacts, and making political tradeoffs (consider this analysis of the economic impact of that ban - [pdf here]). A scientific integrity rule won't give you all the answers you need when making such complex decisions: It's surely not as simple as determining whether scientific findings were manipulated for political purposes.

As explained in a recent National Public Radio story, now we're seeing the first challenge to federal evidence-gathering under this new regime: It's directed at the Bureau of Land Management (a branch of the Department of the Interior). More about that in a moment.

Why is this so hard? It's been rough going for agencies issuing scientific integrity policies. The basics are straightforward enough: Preventing people from twisting or quashing scientific evidence. But here are some reasons why the process is so problematic.

  1. Scope. There's no bright line identifying what "science" should be included in an assessment, and therefore subjected to integrity requirements. This complicates things enough when you're working with "objective" evidence. It's even trickier when you bring in fuzzier stuff, like the dismal science (economics), or risk assessment. How can we say for sure what should be considered? Our assumptions about what's important determine what evidence we recognize - whether consciously or subconsciously. Our choices are influenced by our values - but we may not be fully aware of our values, or we may not want to articulate them in a transparent way.
  2. Dissemination. Each agency's integrity policy is supposed to provide for open communication, and guide how evidence is presented to the public (see the 2010 guidelines mentioned earlier). It wouldn't serve anyone to have a free-for-all; consistent, controlled dissemination can improve usefulness and understanding. But not everyone agrees on the rights and responsibilities of scientists who want to discuss their findings with the public.
  3. Whistle blowing. Among the substantial hurdles is the handling of whistle blowers. (I suppose if such a policy is going to have teeth, the people who want to blow whistles need to feel they can do so without losing their heads.)
  4. Transparency. Scientific groups - such as the Union of Concerned Scientists - say they still want to see external accountability under these policies. So far, investigations of misconduct are internal.

12291 all over again? Thirty years ago, President Reagan signed Executive Order 12291, requiring cost-benefit analysis for 'major' federal regulations (those expected to impact the U.S. economy by $100 million or more). Clinton issued a similar order in 1993. In theory, this should have de-politicized some agency decision-making processes. The results (or lack thereof) were the subject of my doctoral dissertation.

As with the new mandate for scientific integrity, a requirement to weigh regulatory costs against benefits leaves lots of room for interpretation, and requires value judgments. When EPA issues a rule under the Clean Air Act, it's difficult enough to estimate how many hospital visits or early deaths are caused by a particular type of airborne particulate matter. But figuring out the social costs is harder still: Requiring businesses & governments to reduce those emissions can lead to job cuts and economic loss, which themselves cause poverty and negative health impacts.

Challenging BLM's process. Citing the new scientific integrity policy, a group called Public Employees for Environmental Responsibility (PEER) has filed a complaint against the BLM, saying "The U.S. Bureau of Land Management is carrying out an ambitious plan to map ecological trends throughout the Western U.S. but has directed scientists to exclude livestock grazing as a possible factor in changing landscapes....  [O]ne of the biggest scientific studies ever undertaken by BLM was fatally skewed from its inception by political pressure.... As a result, the assessments do not consider massive grazing impacts even though trivial disturbance factors such as rock hounding are included, [and they] limit consideration of grazing-related information only when combined in an undifferentiated lump with other native and introduced ungulates (such as deer, elk, wild horses and feral donkeys)." I didn't know we had a feral ungulate problem. But I digress.

This is a good example of how choices about collecting evidence can strongly influence the results. NPR explains that the Dept. of Interior has a scientific integrity officer who is responsible for investigating allegations of political interference. I wish him Godspeed.


Tuesday, 16 August 2011

Is 'six degrees of separation' fact or fiction? Social scientists collect evidence to find out.

We've always heard about Six Degrees of Separation. Now let's see if evidence backs it up. As explained in the Mercury News, "The world's population has almost doubled since social psychologist Stanley Milgram's famous but flawed 'Small World' experiment gave people a new way to visualize their interconnectedness with the rest of humanity. Something else has also changed - the advent of online social networks, particularly Facebook's 750 million members, and that's what researchers plan to use."

You can join in. Social scientists at Yahoo! and Facebook have launched the Small World Experiment, "designed to test the hypothesis that anyone in the world can get a message to anyone else in just 'six degrees of separation' by passing it from friend to friend. Sociologists have tried to prove (or disprove) this claim for decades, but it is still unresolved.

"Now, using Facebook we finally have the technology to put the hypothesis to a proper scientific test. By participating in this experiment, you'll not only get to see how you're connected to people you might never otherwise encounter, you will also be helping to advance the science of social networks."

Photo credit: Film-Buff Movie Reviews, where they play Six Degrees of Kevin Bacon. I chose this picture of young Kevin because this weekend I saw a trailer for the Footloose remake. Some things should probably stay un-remade. (I was there to see Crazy, Stupid, Love, which I highly recommend. Kevin's in that one, too.)

Tuesday, 12 July 2011

Granular smart grid evidence creates 'aha' moment: Data every 15 seconds vs. 15 minutes.

Peak load not at 5:00pm after all? Typical communications from the local utility ask us to minimize applicance use around 5:00pm, because that's when peak load occurs. That's a busy time, no doubt. But our beliefs may be based on incomplete evidence.

Last week's Science Friday was recorded in San Antonio, Texas. The city's mayor, Julian Castro, and several others spoke about efforts to modernize the energy grid there and across the state. Turns out, their peak load happens closer to 10:30pm. This discovery was made after they began collecting very granular meter data during a pilot project.

Aha moment in Texas. Brewster McCracken, executive director of the Pecan Street Project in Austin, explained that "We kind of almost by accident have ended up with the world's deepest database on how people use energy and now gas and water. It's in 15-second increments.... The most granular data before that was 15 minutes."

"... that's obviously going to be pretty impractical and actually not necessary for utilities, but it is very important for product development to understand, you know, if you're going to try to create a product that is of value to a customer, it's really important to understand what the customer wants. And so one of the ways to do that is to find out their data. And we're finding out, for instance, in the summertime, surprisingly, that the peak time of electricity usage in terms of draw on the grid is 10:30 at night. It's not 5:00 in the afternoon, which was a huge surprise."

McCracken continued: "... here's why, actually. And it doesn't show up in 15-minute data. It only shows up when you take it down to a finer level. We do this in my family, a lot of folks do. You turn down your AC when you go to bed at night to make it a little bit cooler. And the AC has such a huge influence on your home energy usage that it's about - the draw, peak draw can be up to 20 percent higher at like 10:30 at night, 10:00 at night, as it is at 5:00 in the afternoon."

Anecdotal evidence from Colorado. It's 10:16pm as I finish this post. Next thing I'm doing? Cranking up the AC and turning in for the night.

Thursday, 30 June 2011

The 'Baby Einstein' evidence debate continues. University sponsoring flawed study pays $175K in legal fees.

There's still a fight over the evidence on Baby Einstein videos, and it's being fought in the press. Today the Denver Post ran the story 'Baby Einstein' DVD creators find redemption in documents suggesting negative study was flawed. (Baby Einstein is a local company.)

I first wrote about the kerfuffle in 2007. It all began when the University of Washington announced a study finding that for each hour-per-day spent watching baby DVDs/videos, infants understood on average six to eight fewer words than those who didn't watch, and recommended that parents limit their use. "This analysis reveals a large negative association between viewing of baby DVDs/videos and vocabulary acquisition," they claimed. (There's a recap of events on Wikipedia. The mainstream press piled on in a big way.)

Much ado about nothing? Initially, I thought this seemed to be a case of a researcher making alarmist statements to the press, not because the evidence was significant, but to promote themselves. And that's still how it seems. I asked back then "Even if this is true, where is the evidence that 6-8 fewer infant words will matter in the long run?" Using this limited research to claim that Baby Einstein (and other) videos are harmful seemed over the top. Another reason I doubted the importance of the study: "Only 17 percent of 384 babies in the survey were put in front of videos for an hour or more each day. The average baby watched only about 9 minutes a day." One has to wonder what factors influenced the children's development during the, um, other 23 hours and 51 minutes of their day.

Show me all the evidence. The creators of Baby Einstein have been fighting back ever since. Bill Clark and Julie Aigner-Clark have obtained internal documents related to the study, and say they've succeeded in having $175,000 of their legal bills paid by the University of Washington. They also say the documents "confirm what they always suspected: The study was deeply flawed and unfairly characterized Baby Einstein products."

According to today's story, the Clarks found "correspondence from one researcher concerned about how certain results were analyzed. While children 8 months to 16 months who watched baby videos fell behind in vocabulary, the study also found that in children 17 months to 24 months, vocabulary increased and the negative effects evaporated." (Apparently that vocabulary rebound was part of the published report but was downplayed in news releases.)

No follow-up. An internal email, between a reviewer and one of the researchers, asked  "What's the notion about how (we're) reconciling the fact that there was an effect on the young kids but it washed out by the time they were 17-24, and we now will be wanting to follow the young kids when they're older?" Apparently there was supposed to be follow-up research with the same parents, but that was scrapped due to cost.

What a waste of resources. Disney eventually was pressured to offer refunds to Baby Einstein customers. The U.S. Federal Trade Commission (FTC) even wasted time on this. Eventually, as explained in the Washington Post, a separate group of researchers found that although "an hour a day of television viewing won't make your kid a genius, it won't harm his development either.... At first glance, the study's authors found decreased language and visual motor skills in kids who watched more television. But then, the researchers adjusted the results to account for the mother's age, income, education, marital status and vocabulary. And you know what? They found that mom's education level and vocabulary rather than the television greatly impacted baby's."


Tuesday, 29 March 2011

Bad evidence for a good cause.

Remember awhile back when craigslist promised to cut back on its 'adult' classifieds, in response to claims that ads for child prostitution (due to sex trafficking) were rampant on the site? Hmmm. It seems the public outcry -- however well-intended -- may have been based on bad evidence.

Womens-funding-network-junk-science Junk science? This is far removed from my area of expertise, but folks who have looked closely at the situation say the so-called studies claiming exponential growth of child prostitution are based on junk science. Shown here is Steve Doig, the Knight Chair in Journalism at Arizona State University, who says one of the influential studies is based on a logical fallacy (to put it mildly). Some of the accusations about the research methodologies are mind-boggling; I suppose people are quick to accept findings when the stakes are so high.

Links to critiques are provided below. But first (pardon my ignorance), I'm not even sure what's been removed from craigslist since this whole kerfuffle erupted. I looked at their personals section today (for Denver), and saw recent listings for women seeking men, etc.

Evidently, I am clueless. But I don't see how removing some ads from craigslist will help the situation. If law enforcement officials want to hunt down sex traffickers, aren't they better off having the classifieds posted publicly, rather than having this stuff happen over private networks/text messages or whatever? It seems to me that transparency works in favor of law and order.

Here's some further reading about the evidence (or lack thereof) on sex trafficking in the U.S. and elsewhere:

Washington Post: Human Trafficking Evokes Outrage, Little Evidence: U.S. Estimates Thousands of Victims, But Efforts to Find Them Fall Short. 

The most recent cover story in Westword (the Denver version of Village Voice): Women's Funding Network sex-trafficking study is junk science.

Village Voice has put together a list of references, including the London Guardian article Inquiry fails to find single trafficker who forced anybody into prostitution.



Friday, 11 March 2011

A must-read: Yudkowsky's gentle, brilliant explanation of Bayesian reasoning and evidence. And why we overestimate breast cancer occurrence 85% of the time.

Eliezer_yudkowsky_stanford2006 Happy Fun-with-Evidence Friday. No videos today, just an important lesson told in an entertaining way. Eliezer Yudkowsky is a research fellow at the Singularity Institute for Artifical Intelligence. Boy howdy, can he explain stuff. He's written some great explanations of Bayes' theorem for non-practitioners. [Thanks to @SciData (Mike Will) for linking to this.]

How Bayesian reasoning relies on evidence. Bayes depends on something called 'priors': These are prior probabilities (think of them as 'original' probabilities before additional information becomes available). We use evidence to establish the value of these priors (e.g., the proportion of people with a particular disease or condition). Bayes' theorem is then used to determine revised, or posterior, probabilities given some additional information, such as a test result. (Where do people get priors? "There's a small, cluttered antique shop in a back alley of San Francisco's Chinatown. Don't ask about the bronze rat.")

Yudkowsky opens by saying "Your friends and colleagues are talking about something called 'Bayes' Theorem' or 'Bayes' Rule', or something called Bayesian reasoning. They sound really enthusiastic about it, too, so you google and find a webpage about Bayes' Theorem and... It's this equation. That's all. Just one equation. The page you found gives a definition of it, but it doesn't say what it is, or why it's useful, or why your friends would be interested in it. It looks like this random statistics thing." Then he walks through a simple example about calculating breast cancer risk for a woman with a positive mammography result. Very hands-on, including little calculators like this:


Risk, misunderstood 85% of the time? The scary part is this: "Next, suppose I told you that most doctors get the same wrong answer on this problem - usually, only around 15% of doctors get it right. ("Really?  15%?  Is that a real number, or an urban legend based on an Internet poll?" It's a real number. See Casscells, Schoenberger, and Grayboys 1978; Eddy 1982; Gigerenzer and Hoffrage 1995; and many other studies. It's a surprising result which is easy to replicate, so it's been extensively replicated.)"

Evidence slides probability up or down. I especially like Yudkowsky's description of how evidence 'slides' probability in one direction or another. For instance, in the breast cancer example, if a woman receives a positive mammography result, the revised probability of cancer slides from 1% to 7.8%, while a negative result slides the revised probability from 1% to 0.22%.

About priors. Yudkowsky reminds us that "priors are true or false just like the final answer - they reflect reality and can be judged by comparing them against reality. For example, if you think that 920 out of 10,000 women in a sample have breast cancer, and the actual number is 100 out of 10,000, then your priors are wrong. For our particular problem, the priors might have been established by three studies - a study on the case histories of women with breast cancer to see how many of them tested positive on a mammography, a study on women without breast cancer to see how many of them test positive on a mammography, and an epidemiological study on the prevalence of breast cancer in some specific demographic."

The Bayesian discussion references the classic Judgment under uncertainty: Heuristics and biases, edited by D. Kahneman, P. Slovic and A. Tversky. "If it seems to you like human thinking often isn't Bayesian... you're not wrong. This terrifying volume catalogues some of the blatant searing hideous gaping errors that pop up in human cognition."

You must read this. Yudkowky's Bayesian discussion continues with a more in-depth example, eventually leading to a technical explanation of technical explanation. I recommend that one, too.

Wednesday, 15 December 2010

You like insightful analysis of interesting evidence, yes? Then start Barking up the wrong tree.

I highly recommend the blog Barking up the wrong tree. Great stuff on psychology, decision-making, and such. Brought to you by Eric Barker - @bakadesuyo on Twitter. (Thanks to Douglas Heingartner for the heads up on this: I recently wrote about his Metastudies.Org.)

Just the interesting stuff. For each research topic he covers, Barker puts together a snappy summary that gets right to the point. Some examples of the things he writes about:

Do smart people perform worse than dumb people on some cognitive tasks? "[T]here are some cases where additional working memory has no benefit - or can even be a disadvantage. A great example of this can be found in Decaro et al's recent article in Cognition, who show that subjects with higher working memory capacity are actually worse at learning complex rules in a categorization task...

Can you tell if a politician is liberal or conservative just by looking at them? "[P]erceivers were more accurate when they rated politicians whose attitudes were opposite to their own position.... Finally, politicians who were rated accurately had higher chances of being reelected to the following parliamentary session."

Don't delay, start reading Barking up the wrong tree today. Barker provides citations to his sources. Also, I like how he includes links to related previous posts, and provides book recommendations.