### A must-read: Yudkowsky's gentle, brilliant explanation of Bayesian reasoning and evidence. And why we overestimate breast cancer occurrence 85% of the time.

** Happy Fun-with-Evidence Friday.** No videos today, just an important lesson told in an entertaining way. Eliezer Yudkowsky is a research fellow at the Singularity Institute for Artifical Intelligence. Boy howdy, can he explain stuff. He's written some great explanations of Bayes' theorem for non-practitioners. [Thanks to @SciData (Mike Will) for linking to this.]

How Bayesian reasoning relies on evidence. Bayes depends on something called 'priors': These are prior probabilities (think of them as 'original' probabilities before additional information becomes available). We use evidence to establish the value of these priors (e.g., the proportion of people with a particular disease or condition). Bayes' theorem is then used to determine revised, or posterior, probabilities given some additional information, such as a test result. (Where do people get priors? "There's a small, cluttered antique shop in a back alley of San Francisco's Chinatown. Don't ask about the bronze rat.")

Yudkowsky opens by saying "Your friends and colleagues are talking about something called 'Bayes' Theorem' or 'Bayes' Rule', or something called Bayesian reasoning. They sound really enthusiastic about it, too, so you google and find a webpage about Bayes' Theorem and... It's this equation. That's all. Just one equation. The page you found gives a definition of it, but it doesn't say what it is, or why it's useful, or why your friends would be interested in it. It looks like this random statistics thing." Then he walks through a simple example about calculating breast cancer risk for a woman with a positive mammography result. Very hands-on, including little calculators like this:

**Risk, misunderstood 85% of the time?** The scary part is this: "Next, suppose I told you that most doctors get the same wrong answer on this problem - usually, only around 15% of doctors get it right. ("Really? 15%? Is that a real number, or an urban legend based on an Internet poll?" It's a real number. See Casscells, Schoenberger, and Grayboys 1978; Eddy 1982; Gigerenzer and Hoffrage 1995; and many other studies. It's a surprising result which is easy to replicate, so it's been extensively replicated.)"

**Evidence slides probability up or down. **I especially like Yudkowsky's description of how evidence 'slides' probability in one direction or another. For instance, in the breast cancer example, if a woman receives a positive mammography result, the revised probability of cancer slides from 1% to 7.8%, while a negative result slides the revised probability from 1% to 0.22%.

About priors. Yudkowsky reminds us that "priors are true or false just like the final answer - they reflect reality and can be judged by comparing them against reality. For example, if you think that 920 out of 10,000 women in a sample have breast cancer, and the actual number is 100 out of 10,000, then your priors are wrong. For our particular problem, the priors might have been established by three studies - a study on the case histories of women with breast cancer to see how many of them tested positive on a mammography, a study on women without breast cancer to see how many of them test positive on a mammography, and an epidemiological study on the prevalence of breast cancer in some specific demographic."

The Bayesian discussion references the classic Judgment under uncertainty: Heuristics and biases, edited by D. Kahneman, P. Slovic and A. Tversky. "If it seems to you like human thinking often isn't Bayesian... you're not wrong. This terrifying volume catalogues some of the blatant searing hideous gaping errors that pop up in human cognition."

You must read this. Yudkowky's Bayesian discussion continues with a more in-depth example, eventually leading to a technical explanation of technical explanation. I recommend that one, too.

"Errors using inadequate data are much less than those using no data at all." - Charles Babbage

Posted by: Mike Will | Tuesday, 15 March 2011 at 11:43 AM

The method of maximum likelihood operates only with two groups of data. It is 1) data, contained in your hypothesis and 2) the data of your experiment (I say it specially for you, considering the specific features of your work with ion channels). It is clear that you are forced to trust the data of your experiment in these calculations. The experiment is the only criterium of righteousness of your hypothesis. But what if your experiment is incorrect? For example, it contains a systematic error. Or - much worse! - what if your colleague, who carried out the experiment mistook? Then after applying of the method of maximum likelihood you will get an incorrect result. And if some additional data allows you to find out this, it means that you wasted time, making all your calculations by the method of maximum likelihood. You can't correct your result. All your work is crashing.

Unlike the method of maximum likelihood, Bayesian statistics operates with three groups of data - 1) data, contained in your hypothesis; 2) the data of your experiment and 3) additional data. Of course, you can say that when working with the method of maximum likelihood, you count the additional data too. Yes, it is so. But, as I mentioned above, you can't include it in the calculations! While Bayesian statistics allows you to include the additional data directly in the calculation and to increase the quality of the results and, besides, calculate new values of the posterior/prior probabilities. Moreover, it is possible you will be able even to check the correctness of your experimental data, received in the previous stage (again - I say it, considering your own work with ion channels).

Thus, Bayesian statistics is more flexible instrument that the method of maximum likelihood and other methods of conventional statistics. Bayesian method allows to include the reason in the calculations, i.e. in a sense becomes the instrument of reasonable analysis of the reality. So it is not surprising, for example, that Eliezer Yudkowsky, specialist in field of artificial intellect, as well as other his colleagues, is so interested in Bayesian statistics.

As for your sceptical attitude to this method, David, it happens rather because you have never used Bayesian statistics yourself. Your criticism is being explained by your conservatism...

Posted by: Svetlana Pertsovich | Monday, 14 March 2011 at 02:14 PM

No, you are not right, David. You try to absolutize the only approach. It is wrong. There is a season for any method. All must be in good time. Any method has both merits and demerits.

Certainly, the method of maximum likelihood estimators can be useful in the starting stage of mathematical processing of experimental data, if you have no information about prior probabilities. The method is enough simple and doesn't require complicated calculations. And it is really convenient to use the method of maximum likelihood for receiving the results in the first approximation. Moreover, in fact the method of maximum likelihood will give you an output information about probabilities.

But after this stage, you can employ Bayesian analysis for further mathematical processing of your data (including additional new data!), using the information about probabilities, which you receive in method likelihood estimators. This information will play a role of an input information (i.e. information about prior probabilities in fact) in your Bayesian calculations.

Of course, you could object, that you prefer to use the method of maximum likelihood in this new stage of mathematical processing too. And you will be wrong. Because the method of likelihood on the new stage will give you nothing new. While Bayesian method can give much new interesting information. And you are not right in this case, speaking "Bayes is not helpful". Bayes will be greatly helpful in the mentioned stage. Because in general Bayesian statistics is more powerful than method of maximum likelihood. As well as the method of maximum likelihood itself is more powerful than, for example, non-parametric statistics. The power of these methods increases in the series:

Bayesian statistics > maximum likelihood > non-parametric statistics

You can't deny this fact, if you are a real statistician.

Posted by: Svetlana Pertsovich | Monday, 14 March 2011 at 02:13 PM

@Pertsovich

If the prior probabilities have no effect (as one would hope in cases where they aren't known)then there is no reason to bother with them at all.

In such cases surely it is better simply to use the right hand side, i.e. the likelihood (in the technical sense, the probability of the data given your hypothesis)and forget Bayes. This is what R.A. Fisher advocated, and maximum likelihood estimators are what we use for inferences from single ion channel data.

Posted by: David Colquhoun | Sunday, 13 March 2011 at 08:14 AM

@ David Colquhoun

No, Bayesian analysis doesn't become more problematic for the examples, mentioned by you. Bayesian statistics has the special procedures for the cases, which you speak about. For instance, if you have no real numerical knowledge of the prior probabilities, then you must consider the prior probabilities as equal to each other. Generally the choice of the prior probabilities doesn't influence the result in Bayesian analysis. The result depends on informativity of the posterior probabilities.

Besides, Bayesianism has no any limits for its application. It can be used in the same cases, which are being interpreted in the usual frequentist way, with standard conditional probabilities.

However, it is well-known properties of Bayesian analysis. Any statistician knows them.

But maybe did you mean something else?

Posted by: Svetlana Pertsovich | Saturday, 12 March 2011 at 05:01 PM

Yudkowsky's account is indeed lovely, but I don't think it is really touches on the contentious bit of Bayesian statistics at all. It can all be done with standard conditional probabilities, interpreted in the usual frequentist way. There is barely any need to introduce Bayes at all.

Bayes becomes problematic when you have no real numerical knowledge of the prior probabilities and when you you are forced to drop the interpretation of probabilities as long-run frequencies. These problems didn't arise in the examples chosen by Yudkowsky.

Posted by: David Colquhoun | Saturday, 12 March 2011 at 01:41 PM

Yudkowsky is, as my friend Dr. Michael Tobis says, "wicked smart."

Posted by: Rob Ryan | Friday, 11 March 2011 at 11:44 PM