False positives are an important part of a decision about something. We might overlook them when we’re trying to decide if something is valid or not because we usually want to make sure that our tests don’t fail to detect something. A false positive is the opposite: it mistakenly identifies or detects something as valid when it isn’t.
For example, if you have to decide whether an incoming piece of email is spam or not, you’d want to make sure that no spam gets through. You don’t want it to fail to detect some spam. It’s very annoying to receive spam, and today a workplace email system can receive hundreds of thousands of spams a day. Of course, the most efficient way of making sure you get all the spam is to zap 100% of all incoming emails–then you’ll never get any spam.
But you’ll also never get any valid email. So the system needs to decide: is this piece of email spam or not? Ideally the system should truly detect all spam, but also not falsely say an email is spam when it isn’t. Otherwise you’ll miss getting valid emails like that super job offer.
The system where I work has a mechanism where you receive an email with a list of all the blocked spam. The idea is that you can go into that email and check to see if any valid emails have been blocked. This email comes several times a day. I think I looked once, like 6 months ago and haven’t done it since. So for all I know I’m missing tons of valid emails–the false positives.
Let’s say you get a test for cancer and it comes back positive. This is obviously very worrying. Some cancers have high mortality rates. Some give a very short life expectancy. But cancer doesn’t have to be a death sentence. As I understand it there are several reasons for this (some medical, and since I’m not a doctor I won’t go into them since I don’t want to type something incorrect. However, I would read up on this; I personally found Stephen Jay Gould’s description of his own battle with cancer very illuminating.*)
One of the statistical comforts one might draw however is that your test might be a false positive. In fact it’s common that the positive result is likely to be false, especially if the underlying incidence of the disease is unlikely.
Consider a disease or cancer that occurs in about 1 person in 100. In 100 people therefore one person has cancer. Now take a test that can detect cancers all the time (and it therefore detects hers). But the test also indicates that 9 other people have cancer. What are your chances that you actually have cancer if it comes up positive?
Well you can see that it’s only 10%. One person has cancer and came up positive, but 9 others also came up positive. That’s 10 positives: false positives. Only 1 was real. Even if the test was improved considerably, so that it only picked up 1 other false positive your chances are still 50-50 that you have cancer. Your best advice there is to get a second opinion.
*The Gould essay is in Full House and is called “A personal essay.” It too is a statistical story, this time on his deepening understanding of the diagnosis “the median mortality is 8 months” and why that wasn’t a death sentence. An earlier version called “the median is not the message” is featured on Steve Dunn’s cancer website here.
A final note: we often pay more attention as well to group averages and overall patterns, but then attempt to infer the characteristics of an individual from that average. Call it stereotyping, the ecological fallacy, racism, profiling, or “the median is not the message” but this is a powerful problem of our times.