So, are lie detectors, and particularly voice stress recognition systems any good? Standard polygraphs (as administered by trained and experienced personnel) detect physiological signs associated with lying, although these can be absent in the truly psychopathic, you can learn to fool them, and anxiety can also produce them. Most studies of standard polygraphs are carried out on offenders, and they tend to find fairly high detection rates. Reported figures are typically of the order of a sensitivity (proportion of liars detected) of 76% and specificity (number of confirmed truth-tellers) of 63% ('average' values), with 87% and 88% representing the upper range of estimates ('maximal' values), which doesn't sound too bad. But the utility of the polygraph (or any test in fact) very much depends on how likely it is that the suspect is guilty (the prevalence of true positives in the population) because if few people are guilty then even though only a small proportion of truth-tellers are falsely declared liars the large number of truth-tellers tested compared to the small number of liars means that most people reported as liars will actually be truth-tellers. Conversely, if most people you are testing are guilty (and thus liars) then even though a lot of guilty people will be detected, a lot of those declared innocent will actually be lying.
To make that rather convoluted explanation a bit more concrete, I refer you to a rather famous paper by Brett et al (1986, Lancet) which used the figures above (the 'average' and 'maximal' values). They showed that when the prevalence of offenders in the population is assumed to be 5% (i.e. not many, such as with benefit cheats) there was a 10% positive predictive value, that is only 1 in 10 positive tests are actually lying, with the rest falsely accused (that is with the 'average' values, using the 'maximal' values they find 25% true positives).
For a pre-test probability of 50% (e.g. criminal investigations, hopefully, maybe) the positive predictive value is 67% (88% with the 'maximal' values), a gain in certainty after the test of only 17%, with 33% of positive results still false positives. If most people are liars (90%) then the negative predictive value is only 23% with 77% of negative test results generated by lying subjects. It is often said that if you are innocent you probably don't want to take the risk of being falsely labelled a liar, and thus a suspect, while it may be worth taking the chance if you're guilty anyway, as it could throw them off the scent!
So we know that polygraphs aren't going to be that great at detecting liars in the population, even though they do work to some extent. There was a study done last year (Gamer et al 2006, Int J Psychophysiol) of polygraph measures (heart rate etc)* and voice stress recognition. It used the Guilty Knowledge Test (GKT):
"If, for example, a robbery of a fuel station is examined, a typical GKT-question could be: “Which car was used for the robbery of the fuel station last night?” If in fact a red BMW was used, proper items for this question could be “(a) a green Ford?”, “(b) a blue Mercedes?”, “(c) a red BMW?”, “(d) a yellow Chrysler?”, “(e) a black Pontiac?”. According to the assumptions of the GKT, only the culprit should be able to differentiate relevant and irrelevant items correctly and thus show more pronounced physiological responses to the relevant item."They used the TrusterPro program (made by Israeli company Nemesysco, and I believe the core of the Capita program used in the UK) - and found a sensitivity of 30% and specificity of 83%, which was not significantly above chance. More detailed analysis of specific raw factors did not reveal any further discriminative ability. If you analyse the figures in the same way as Brett et al did with the polygraph you find similar results, with (assuming 5% prevalance of liars) only 8% positive predictive value - i.e you aren't doing much better by using this voice recognition system than just randomly selecting people and deciding they're benefit cheats, and over 90% of those you designate as liars aren't, while 70% of cheats will still get away with it - and this of course assumes optimal scientific study levels of operator training and question format (it is, I would imagine, unlikely that they will use the GKT structure).
The low sensitivity is a real problem because the DWP themselves say:
"If the pilot is successful we will consider the case for changes to verification procedures for cases adjudged to be low risk, potentially reducing the need to issue and process forms and undertake unnecessary and expensive visits."If they are only using this software for people they already suspect are dodgy it is truly useless. Say we have a 50% chance this person is a benefit cheat (based on their funny looking application) checking them with this software gives a negative predictive value (number of true negatives) barely better than chance at 54%.
*this study used a logistic regression analysis to calculate what is essentially a theoretical upper bound on the information that can be provided by polygraph type measures in this study, and sensitivity was 93% and specificity 97%. It is an upper bound because it inevitably overfits the data from this study and we don't know whether it would generalise to another sample.
The Deception Blog has more on this sort of thing.
There is a discussion on this topic at the Badscience forums.