Even though we can use our mental skills to try to find correct answers, it seems argumentation, for power and for bonding, is indeed a very important aspect to consider. But this does not mean we aim to win every discussion. The very purpose of winning a discussion is to have others agreeing with us. Therefore, argumentation only makes sense in a context where people try to arrive at some kind of consensus.
Asch experiment points exactly at that. When the group exerted its pressure over people, they were not observed to argument, but to accept the view of the others. The pressure to conform was strong enough that even a trivial task could end in an erroneous answer. It is worth, at this point, to remember the effect known as irrational consistency. In this case, when people hold a specific belief, they tend to also believe in a complete set of logically independent assumptions in a way that they all support their main belief. While this makes no sense if one's objective is only to hold beliefs as close to the truth as possible (the best action or policy or choice is usually the best one for a set of reasons but also despite a number of problems it might be associated with), it makes perfect sense if the objective is defend a point of view and to conform with a group of people with similar ideas.
It seems we are not really so much interested in truth as we are at confirming our beliefs. We can see people using strategies on a daily basis built to avoid cognitive dissonance, to avoid finding ideas that they disagree with. Recently, my wife shared a post in a social media where a number of Brazilian religious leaders were mocked. These individuals openly request money from their followers (usually poor people who could really have better uses for the money) in exchange for the goodwill of their god. They claim that they could cure diseases easily. Since at the time she posted it, an outbreak of ebola was (and still is, as I write it) treatening several countries in Africa, it would be only logical for someone who cared about the welfare of people and who genuinely possessed healing powers to use those powers at serious diseases we still don't have the cure for. The post, therefore, suggested three known religious leaders should head to the ebola region to treat the disease.
The tone was one of irony but, if the belief those individuals claim were true and they actually believed that, the suggestion was indeed not more than the logical consequence of those beliefs. The fact that there is something is fishy in this situation, however, is perceived by everyone. And a friend of my wife, who follows the same denomination, complained about the generalization of the post, where all leaders were basically described as liars. I couldn't help myself from joining the discussion. What is interesting is that, as soon as I just entered it with a joking remark, before I could actually reason about the problem, she decided to stop the discussion, to prevent possible damages to our relationship (we never met in person) and deleted her entries to the discussion. Including the post where she told she was leaving the discussion, I was only able to read the post because the comment was sent to my email account. When I actually logged to the site, planning to point out the proposal was actually the logical conclusion for her own beliefs, I could no longer see anything from her.
This refusal to debate things that make us uncomfortable is a very common effect. As a matter of fact, it is commonplace to even advise people to avoid discussions about politics, religion, or sports. When, obviously, if you were interested on finding out the truth instead of winning arguments, talking about those issues and, far more importantly, finding out about the facts and the competent analysis in those subjects, albeit how rare they might be, should be on your list of priorities. But we take disagreements as personal attacks (if I had to guess, I'd say that taking disagreement as personal offense is a problem that is even more serious among supposedly well educated people, such as lecturers in academic positions), and we use faulty reasoning whenever the conclusion feels like it would support us and we can get away with it.
It seems we indeed need something much better than our reasoning natural talents if we ever aim to find the best, correct answers.
Search This Blog
Thursday, December 18, 2014
Wednesday, November 19, 2014
Groupthinking - Teoria Argumentativa do Raciocínio (Portuguese version of the post on Argumentative Theory of Reasoning)
Nota: Depois de ter sobrevivido a quantidade de argumentos sem
conteúdo que caracterizou a eleição no Brasil, achei melhor
traduzir ao menos esse e os próximos posts. Há sempre uma pequena
esperança de que alguém leia, se identifique e aprenda algo. Segue
a tradução do último post, que é a primeira parte do que vou
escrever sobre o tópico:
------------------
Nós somos capazes de criar coisas incríveis, seja nas Ciências, em Tecnologia, ou como Arte. Nós fomos mais longe que qualquer outro ser vivo no nosso planeta e a escala de nossas conquistar não tem rival. No entanto, como temos visto, nossa capacidade de raciocinar, ainda que bem adaptada para o meio ambiente onde nossos ancestrais viveram, tem falhas graves e, ainda que sejamos capazes de realizar muito mais quando muitos de nós se juntam para isso, comunidades também são capazes de criar novos tipos de problemas quando tentamos descobrir a verdade.
Isso sugere a pegunta: por que somos exatamente como somos? Regras lógicas não são realmente complicadas e poderiam existir dentro dos nossos cérebros com um baixíssimo custo. Por outro lado, a Lógica Clássica (que discutirei em entradas futuras) assume que algumas afirmações sejam verdadeiras e, no mundo real, nós simplesmente não podemos assumir de início que certas premissas sejam a verdade. Dessa forma, é possível que a Lógica não tem sido facilmente aplicável nas circunstâncias de nossos antepassados. Ainda assim, resta o problema de quais forças teriam guiado a evolução de nosso intelecto até o ponto atual.
Hugo Mercier e Dan Sperber propuseram recentemente uma ideia que parece capturar ao menos um aspecto essencial da resposta. Eles observaram que nossas habilidades mentais e verbais, o que dizemos uns aos outros, não evolui para a procura da verdade. Eles sugeriram que, sendo seres sociais, nosso raciocínio evoluiu num ambiente em que, se os demais acreditassem em você, você teria mais poder e uma melhor chance de se reproduzir. Isso teria significado uma pressão evolutiva no sentido de sermos capazes de argumentar e convencer os demais, independentemente de nosso ponto de vista estar correto. Sua Teoria Argumentativa do Raciocínio (Argumentative Theory of Reasoning) afirma que nosso raciocínio existe com o propósito de nos tornar competentes em debater e convencer. O que é frequentemente bastante diferente de se chegar à resposta correta.
De fato, Mercier observou que a ideia de que raciocinamos para tornar nossos argumentos convincentes (argumentos que podem ou não ser verdadeiros) para se aplicar também a crianças e a outras culturas. Isso não quer dizer que não sejamos capazes de utilizar nossos intelectos na busca de soluções corretas para os problemas que encontrarmos. Quando comparamos, em uma entrada anterior, nossas habilidades de resolver os problemas lógicos formamelmente idênticos das cartas e do consumo de álcool, vimos que, nas situações que são familiares, nós éramos capazes de raciocinar de forma competente. Achar respostas melhores pode também ter contribuído para moldar nosso raciocínio. Mas a evidência de que boa parte dele evoluiu apenas para nos permitir convencer os demais é bastante convincente.
------------------
Nós somos capazes de criar coisas incríveis, seja nas Ciências, em Tecnologia, ou como Arte. Nós fomos mais longe que qualquer outro ser vivo no nosso planeta e a escala de nossas conquistar não tem rival. No entanto, como temos visto, nossa capacidade de raciocinar, ainda que bem adaptada para o meio ambiente onde nossos ancestrais viveram, tem falhas graves e, ainda que sejamos capazes de realizar muito mais quando muitos de nós se juntam para isso, comunidades também são capazes de criar novos tipos de problemas quando tentamos descobrir a verdade.
Isso sugere a pegunta: por que somos exatamente como somos? Regras lógicas não são realmente complicadas e poderiam existir dentro dos nossos cérebros com um baixíssimo custo. Por outro lado, a Lógica Clássica (que discutirei em entradas futuras) assume que algumas afirmações sejam verdadeiras e, no mundo real, nós simplesmente não podemos assumir de início que certas premissas sejam a verdade. Dessa forma, é possível que a Lógica não tem sido facilmente aplicável nas circunstâncias de nossos antepassados. Ainda assim, resta o problema de quais forças teriam guiado a evolução de nosso intelecto até o ponto atual.
Hugo Mercier e Dan Sperber propuseram recentemente uma ideia que parece capturar ao menos um aspecto essencial da resposta. Eles observaram que nossas habilidades mentais e verbais, o que dizemos uns aos outros, não evolui para a procura da verdade. Eles sugeriram que, sendo seres sociais, nosso raciocínio evoluiu num ambiente em que, se os demais acreditassem em você, você teria mais poder e uma melhor chance de se reproduzir. Isso teria significado uma pressão evolutiva no sentido de sermos capazes de argumentar e convencer os demais, independentemente de nosso ponto de vista estar correto. Sua Teoria Argumentativa do Raciocínio (Argumentative Theory of Reasoning) afirma que nosso raciocínio existe com o propósito de nos tornar competentes em debater e convencer. O que é frequentemente bastante diferente de se chegar à resposta correta.
De fato, Mercier observou que a ideia de que raciocinamos para tornar nossos argumentos convincentes (argumentos que podem ou não ser verdadeiros) para se aplicar também a crianças e a outras culturas. Isso não quer dizer que não sejamos capazes de utilizar nossos intelectos na busca de soluções corretas para os problemas que encontrarmos. Quando comparamos, em uma entrada anterior, nossas habilidades de resolver os problemas lógicos formamelmente idênticos das cartas e do consumo de álcool, vimos que, nas situações que são familiares, nós éramos capazes de raciocinar de forma competente. Achar respostas melhores pode também ter contribuído para moldar nosso raciocínio. Mas a evidência de que boa parte dele evoluiu apenas para nos permitir convencer os demais é bastante convincente.
Group Thinking - Argumentative Theory of Reasoning
We can create amazing things, be it in Science, Technology, or Art. We have gone further than any other living being in our planet, the scale of our accomplishments is unparalleled, we have made changes, for better and for worse, in the whole surface of our planet. And yet, as we have seen, our reasoning, while well adapted to the environment of our ancestors, has serious flaws and, while we can sometimes achieve much more when several of us are involved, communities can also create a whole new level of problems for the problem of pursuing the truth.
So, why are we exactly the way we are? Logical rules are not really complicated and could work inside our brains almost without any cost. On the other hand, Classical Logic (we will talk about different logics in future posts) assumes some statements to be true and, in the real world, we can not simply choose some premises to be true. In that sense, it might not have been easily applicable in many circumstances. Still the question remains, what were the forces that drove the evolution of our intellect to the point we are today?
Hugo Mercier and Dan Sperber recently proposed an idea that seems to capture at least one essential aspect of the answer to that question. What they observed is that our mental and verbal skills, what we say to each other, might not have evolved for the pursuit of truth. What they have proposed is that, as social beings, our reasoning evolved in an environment where, if you were believed, you would have more power and a better chance at surviving. And that meant a pressure to be able to argument well and convince people, regardless of the correctness of the underlying reasoning. Their Argumentative Theory of Reasoning states that our reasoning exists for the purpose of making us competent at debating and convincing others. And that is often not the same as arriving at the right answer.
And, as a matter of fact, Mercier observed that the idea that we reason in order to make convincing arguments (that might turn out to be true or not) seems to be applicable also for children and other cultures. All of this does not mean that we can not use our intellects to pursue correct answers to the problems we face. When comparing our abilities in the formally identical logical problems of the cards and of alcohol consumption, we have seen that, for situations we are used to, we actually reason in competent ways. Finding better answers must also have contributed to shaping our reasoning. But the evidence that a good part of it evolved simply to allow us to win arguments is quite compelling.
So, why are we exactly the way we are? Logical rules are not really complicated and could work inside our brains almost without any cost. On the other hand, Classical Logic (we will talk about different logics in future posts) assumes some statements to be true and, in the real world, we can not simply choose some premises to be true. In that sense, it might not have been easily applicable in many circumstances. Still the question remains, what were the forces that drove the evolution of our intellect to the point we are today?
Hugo Mercier and Dan Sperber recently proposed an idea that seems to capture at least one essential aspect of the answer to that question. What they observed is that our mental and verbal skills, what we say to each other, might not have evolved for the pursuit of truth. What they have proposed is that, as social beings, our reasoning evolved in an environment where, if you were believed, you would have more power and a better chance at surviving. And that meant a pressure to be able to argument well and convince people, regardless of the correctness of the underlying reasoning. Their Argumentative Theory of Reasoning states that our reasoning exists for the purpose of making us competent at debating and convincing others. And that is often not the same as arriving at the right answer.
And, as a matter of fact, Mercier observed that the idea that we reason in order to make convincing arguments (that might turn out to be true or not) seems to be applicable also for children and other cultures. All of this does not mean that we can not use our intellects to pursue correct answers to the problems we face. When comparing our abilities in the formally identical logical problems of the cards and of alcohol consumption, we have seen that, for situations we are used to, we actually reason in competent ways. Finding better answers must also have contributed to shaping our reasoning. But the evidence that a good part of it evolved simply to allow us to win arguments is quite compelling.
Tuesday, October 28, 2014
Group Thinking VI
Social influence, in the previous examples, was limited to the case where all participants in the group were initially treated as equals (even when the suggestion was to listen only to the most confident people, that was a characteristic determined a posteriori.). But it is not always the case. It often happens that, when we have to make a decision in a social context, one or more individuals hold a special position, for example, as bosses or as authorities in the subject.
In a very famous (and also infamous) experiment, Stanley Milgram decided to investigate how it was possible that nazism could dominate Germany, when most Germans were actually not murderous psychopaths. The setting of the experiment was simple. One scientist was at the room controlling the situation while two people, being tested, were assigned to two different roles. One of them was tied to a chair connected to a machine that could be turned on to administer electric shocks to the sitting person. The task of the second individual was to switch the button that caused the shock, when instructed.
Questions were asked to the first subject and when those questions were answered correctly, no shock was administered. However, each error was to be punished with a shock, starting at the small voltage of 15V. Each error made the shock 15V stronger than the previous one, up to a final shock of 450V.
What the second subject, who inflicted the shocks, didn't know was that no real shock was been applied and that the person tied to the chair was an actor instructed to act as if the shock was real. The actor would get some answers right and some wrong, showing just some discomfort at first. Eventually, the actor would beg for the experiment to stop, showing very clear signs of distress and pain. And the scientist would instruct the second subject to keep on with the shocks, despite those pleas.
Milgram reported that, despite many people showing signs of extreme stress while hearing the cries of pain from the actor, 65% of them kept obeying the scientist up to the maximum voltage. The experiment had several problems and can be criticized in many ways, including the serious ethical problem of the horrible psychological pain it caused to the people who kept pressing the button despite their own begging for the scientist to stop. Comparisons with nazism are indeed not exact for a series of factors and the 65% figure is actually the percentage of the experiment where most people agreed with the scientist, while other problems were reported on how well the experiment script was followed (see, per example, Behind the Shock Machine: The Untold Story of the Notorious Milgram Psychology Experiments ).
Despite all those problems, and remembering that the 65% figure is almost certainly an inflated one, the experiment shows how one authority figure that we trust (a scientist conducting an experiment where we were assured nobody would actually be harmed) can make us even do actions we are viscerally opposed to. In this case, no change of opinion was actually observed, but actions were not what we would expect from normal, thinking human beings.
In a very famous (and also infamous) experiment, Stanley Milgram decided to investigate how it was possible that nazism could dominate Germany, when most Germans were actually not murderous psychopaths. The setting of the experiment was simple. One scientist was at the room controlling the situation while two people, being tested, were assigned to two different roles. One of them was tied to a chair connected to a machine that could be turned on to administer electric shocks to the sitting person. The task of the second individual was to switch the button that caused the shock, when instructed.
Questions were asked to the first subject and when those questions were answered correctly, no shock was administered. However, each error was to be punished with a shock, starting at the small voltage of 15V. Each error made the shock 15V stronger than the previous one, up to a final shock of 450V.
What the second subject, who inflicted the shocks, didn't know was that no real shock was been applied and that the person tied to the chair was an actor instructed to act as if the shock was real. The actor would get some answers right and some wrong, showing just some discomfort at first. Eventually, the actor would beg for the experiment to stop, showing very clear signs of distress and pain. And the scientist would instruct the second subject to keep on with the shocks, despite those pleas.
Milgram reported that, despite many people showing signs of extreme stress while hearing the cries of pain from the actor, 65% of them kept obeying the scientist up to the maximum voltage. The experiment had several problems and can be criticized in many ways, including the serious ethical problem of the horrible psychological pain it caused to the people who kept pressing the button despite their own begging for the scientist to stop. Comparisons with nazism are indeed not exact for a series of factors and the 65% figure is actually the percentage of the experiment where most people agreed with the scientist, while other problems were reported on how well the experiment script was followed (see, per example, Behind the Shock Machine: The Untold Story of the Notorious Milgram Psychology Experiments ).
Despite all those problems, and remembering that the 65% figure is almost certainly an inflated one, the experiment shows how one authority figure that we trust (a scientist conducting an experiment where we were assured nobody would actually be harmed) can make us even do actions we are viscerally opposed to. In this case, no change of opinion was actually observed, but actions were not what we would expect from normal, thinking human beings.
Thursday, October 9, 2014
Group Reasoning V
The possibility of a better judgment when using groups is a tool we might want to employ, while avoiding the circumstances where phenomena like groupthink happens. That means that solid evidence on when we should expect problems and when the wisdom of crowds is expected to work on out favor is required.
Indeed, one first question that comes to mind is how strong the influence between the member of a group needs to be so that we should start to worry. Lorenz et al designed an experiment to address that question. During their experiments, they asked people to answer factual questions and, after they have expressed their opinion with no group influence, they provided information about the answers of other people. Each subject had, then, the opportunity of changing their original answer. What they observed was that, while the group was initially ``wise'' (in the sense of wisdom of crowds), the social influence tended to diminish the diversity of observed answers to such a degree that it was possible that the correct value was no longer included in the range of answers. And, despite that, the confidence of the individuals in the social answer was increased! This shows that even weak social influence can undermine the wisdom of the crowds effect. The authors suggest that opinions should be obtained with no element of social influence in order to capture the advantages of group reasoning.
Of course, while desirable, it is not always possible to eliminate the social influence inside the group in a meaningful way. De Polavieja and collaborators, while studying this problem, have suggested that the beneficial effects of group reasoning can still be obtained, even under the presence of social effects if we just use the opinions of the very confident people, who did not change their initial opinions, despite the social pressure. Note that while groupthink can be a very powerful influence, it might not be enough to convince everyone and independent minded individuals might be able to retain the initial range of views. And, with that, the wisdom that was presented in the crowd before interaction.
Such a proposal, however, might be labeled as anti-democratic (the word democracy, unfortunately, is nowadays used even to defend positions that defend that just a small fraction of the interested parties should be listened to, arguments that use it should be read with extreme caution), depending on the context where it might be applied. The general advise to make the social influence between deciders as small as possible, however, stands. This does not mean that the different alternatives should not be presented to voters, quite the opposite. What the literature shows is that interaction between the voters should be minimum, not between the people presenting and debating the alternatives. In large societies, most voters do not interact with other, unlike the laboratory experiments and, therefore, it is not clear that groupthink will happen.
The situation is very different in committees or smaller gatherings. In this cases, the internal pressures inside the group might indeed destroy our ability to think and replace it with our desire to conform. This is not just true about the final opinions. In her work on the performance of groups and individuals, Gayle Hill also studied how the interaction might affect brainstorm sessions. What she observed was that, when people were asked to plan new ideas for a brainstorm session and bring them ready, the added independent work was consistently more creative than when the ideas were thought during a meeting.
The composition of a group is also a key factor in the quality of its reasoning. Ilan Yaniv studied how well a group was capable to avoid framing effects. Framing effects happen when people change their decisions simply because the question they had to answer was presented to them (framed) in different words. In this study, Yaniv observed that by increasing heterogeneity in the group, simply by assigning individuals to different frames, had a very strong impact on getting rid of the biases, while homogeneous groups performed much worse than individuals.
In a review of the literature in the area, Elizabeth Mannix and Margaret A. Neale discussed the benefits and problems observed under many different circumstances of increasing diversity in a groups. They concluded that there are types of heterogeneity that can make a group have problems in the areas of interpersonal attraction and liking, such as differences in race/ethnicity, gender or age. But, from an information processing point of view, diversity should be able to improve the group results, despite possible management problems it might cause. In particular, underlying, less obvious differences such as different backgrounds, education or personalities were indeed associated with improvements in performance.
The conclusion of all these experiments seem to be that there is, indeed, a lot of knowledge and intelligence in a group, but crowds are very stupid. They make mistakes individuals would rarely make.
Indeed, one first question that comes to mind is how strong the influence between the member of a group needs to be so that we should start to worry. Lorenz et al designed an experiment to address that question. During their experiments, they asked people to answer factual questions and, after they have expressed their opinion with no group influence, they provided information about the answers of other people. Each subject had, then, the opportunity of changing their original answer. What they observed was that, while the group was initially ``wise'' (in the sense of wisdom of crowds), the social influence tended to diminish the diversity of observed answers to such a degree that it was possible that the correct value was no longer included in the range of answers. And, despite that, the confidence of the individuals in the social answer was increased! This shows that even weak social influence can undermine the wisdom of the crowds effect. The authors suggest that opinions should be obtained with no element of social influence in order to capture the advantages of group reasoning.
Of course, while desirable, it is not always possible to eliminate the social influence inside the group in a meaningful way. De Polavieja and collaborators, while studying this problem, have suggested that the beneficial effects of group reasoning can still be obtained, even under the presence of social effects if we just use the opinions of the very confident people, who did not change their initial opinions, despite the social pressure. Note that while groupthink can be a very powerful influence, it might not be enough to convince everyone and independent minded individuals might be able to retain the initial range of views. And, with that, the wisdom that was presented in the crowd before interaction.
Such a proposal, however, might be labeled as anti-democratic (the word democracy, unfortunately, is nowadays used even to defend positions that defend that just a small fraction of the interested parties should be listened to, arguments that use it should be read with extreme caution), depending on the context where it might be applied. The general advise to make the social influence between deciders as small as possible, however, stands. This does not mean that the different alternatives should not be presented to voters, quite the opposite. What the literature shows is that interaction between the voters should be minimum, not between the people presenting and debating the alternatives. In large societies, most voters do not interact with other, unlike the laboratory experiments and, therefore, it is not clear that groupthink will happen.
The situation is very different in committees or smaller gatherings. In this cases, the internal pressures inside the group might indeed destroy our ability to think and replace it with our desire to conform. This is not just true about the final opinions. In her work on the performance of groups and individuals, Gayle Hill also studied how the interaction might affect brainstorm sessions. What she observed was that, when people were asked to plan new ideas for a brainstorm session and bring them ready, the added independent work was consistently more creative than when the ideas were thought during a meeting.
The composition of a group is also a key factor in the quality of its reasoning. Ilan Yaniv studied how well a group was capable to avoid framing effects. Framing effects happen when people change their decisions simply because the question they had to answer was presented to them (framed) in different words. In this study, Yaniv observed that by increasing heterogeneity in the group, simply by assigning individuals to different frames, had a very strong impact on getting rid of the biases, while homogeneous groups performed much worse than individuals.
In a review of the literature in the area, Elizabeth Mannix and Margaret A. Neale discussed the benefits and problems observed under many different circumstances of increasing diversity in a groups. They concluded that there are types of heterogeneity that can make a group have problems in the areas of interpersonal attraction and liking, such as differences in race/ethnicity, gender or age. But, from an information processing point of view, diversity should be able to improve the group results, despite possible management problems it might cause. In particular, underlying, less obvious differences such as different backgrounds, education or personalities were indeed associated with improvements in performance.
The conclusion of all these experiments seem to be that there is, indeed, a lot of knowledge and intelligence in a group, but crowds are very stupid. They make mistakes individuals would rarely make.
Wednesday, October 1, 2014
Human Stupidity: Historical: Group Reasoning IV
The consequences of social influence that emerge from these observations
are disturbing ones. I am sure that some readers might have trouble
accepting, even after all the evidence presented so far on our reasoning
shortcomings. But the extent to which we can be influenced by a group
of people, even when that group is wrong, is something that is very well
documented. In a famous experiment, Solomon Asch proposed a very trivial question to his subjects, based on the figure
bellow.
The people involved in the experiment should just state which of the three lines in the right card (A, B, or C) had the same length as the line in the left card. When asked the question in the control situation, with no influence of anyone else, those who were being tested picked the correct option (line C) 99% of the times. The purpose of the experiment was to see how people would react when the information from others disagreed with their perception. In order to test it, a part of the subjects were tested in a situation where they first listened to the individual opinions of other people, who were actually actors. Those actors were instructed to provide the correct answer in some of the trials, but the wrong one (line A) in most of them. In each trial, all actors provided the same answer.
What Asch observed was that, when the actors provided the wrong answer before the individual being tested answered, this person would make the wrong choice up to 75% of the times. The effect required a minimum majority of 3 people to be observed. However, the effect did not become stronger as more actors were added, all in agreement with the wrong choice.
More recently, evidence about what might be happening inside our brains was obtained by testing the reactions of people while conducting functional magnetic resonance imaging (fMRI) of their brains. Eisenberger et al were able to observe that, when we experience rejection, the participants showed brain activity similar to that observed when people experience real physical pain. Of course, this does not answer if people actually changed their perception of the world or if they would just agree with the majority while still somehow noticing that majority opinion was wrong.
While investigating that, Berns et al observed that both perceptual and emotional processes were involved in our brains in circumstances similar to those of the Asch experiment. Adding to that, Klucharev et al found clear evidence that our conformity to the group norms or opinions happen through learning mechanisms. This suggests that the influence of the group might actually change the way people perceive the world.
Social influence is pervasive and we are rarely aware of it. Even through social media, it was possible to detect that emotion can be contagious, without any non-verbal cues, simply by reading about the emotions of a friend. While this specific work was was criticized by the use of Facebook data without explicit user consent (implicit consent from accepting the terms of service was assumed by the authors, PNAS added a comment to the beginning of the article to point this possible problem), it highlights very clearly how we are actually influenced even with very little information.
Wednesday, September 17, 2014
Human Stupidity: Historical: Group Reasoning III
It is not always true that groups always outperform individuals, though. Comparisons between the estimates of a group and those of the best informed individual in the same group did not provide such a clear cut answer. In that case, the results of the experiments were not consistent between different problems. Sometimes the groups were able to provide better results than their most competent member, while, under different circumstances, the best member was capable of outperform the group.
As a matter of fact, Kerr et al. concluded there is no simple answer to the question of whether individuals or groups are more biased. Both gains and losses have been observed as consequence of obtaining the opinion of groups. Different circumstances on how the group interact can make a significant difference on the outcome as well as the type of question or task proposed. The number of papers on the subject is quite large and, here, I will just comment on a few cases where problems have been observed. The cases I will describe are very far from exhaustive and no claim about importance is made.
A classical case of group decisions going wrong is the circumstance coined by Irving Janis as groupthink in Victims of Groupthink: A psychological study of foreign-policy decisions and fiascoes. Groupthink is what happens when the desire to conform and agree with others is such that it interferes with critical thinking. In those situations, people might adopt some idea that they believe better conform to the group norms, instead of actually providing their best independent evaluation. This can happen in a variety of circumstances, from groups with a strong sense of belonging (sport fans or religious communities, for example), to cases where one opinions have a strong moral value attached to them or when people simply want to show support for a leader (for example, their boss).
What is particularly troublesome about groupthink is that, when it is observed, it is not just the case that the group makes decisions that are worse than its most competent member. It can actually happen that the group will reason in ways that are much worse than the average individual of the group would. Examples of this can be often observed in the behavior of crowds in sport events, where insults and violence happen far more often than it would be reasonable to expect if those same people were deciding as individuals.
Wednesday, September 10, 2014
Human Stupidity: Historical: Group Reasoning II
Group decisions happen every day. We choose the people who will represent us in the government (in several countries, at least), we participate in groups of different sizes that have to reach an agreement about how to act (assuming a collective action does happen). Sometimes a group decision can be described as the sum of mostly independent decisions and actions, taken individually, as in an election. At other times, we assemble and discuss and the final estimate or the final action is decided as a result of the social process that happens between the assembled people. And, under different circumstances, a society may move in a direction that is just the consequence of how many individual actions interact with each other, with no real sense of group decision, except as a consequence of the sum of the behaviors and their interactions. One example of this is the fluctuation of prices as a consequence of the individual decisions of buyers and sellers. In this last case, all reasoning can be described as individual reasoning, while in the first two, decisions are made as a consequence of the sum of the opinions and, sometimes, the interactions between those opinions.
While the case of how the actions of people can influence the decision of the societies as a whole is very interesting (and I will return to it farther ahead), when we talk about the reasoning of a group, this is usually understood to be the first two cases. At this point, we will just discuss the cases where some reasoning is expected from the group, with or without interaction between its members.
After so many disappointments on our individual abilities, it makes sense to start with some good news. More than a hundred years ago, during the West of England Fat Stock and Poultry Exhibition, Francis Galton observed a contest where people attempted to provide the best guess for the weight of a fat ox . Of course, people proposed a range of different values, some close and some very distant from the true value (1,198 pounds). What surprised Galton was the fact that the median of the guesses was actually very close to the correct value at 1207 pounds. Later, he reported the average of the guesses was even closer to the real value, at 1,197 pounds!
This effect, where some average estimate provided by a group of people shows a remarkable agreement with reality was later coined as the Wisdom of Crowds. Galton associated this with the strength of a democratic government, where decisions arise from some kind of averaging over the opinions of many. Of course, the observation of one single case of a group estimate was not enough for a conclusion and several experiments were performed to test how well groups perform. In a 1982 review, Gayle Hill discusses the case of several papers published since Galton's initial observation. In her review, Gayle presented four different comparisons (in all cases, the results for groups included both groups working independently as well as groups where people were allowed to interact with each other): groups versus individuals, groups versus the most competent member, groups versus statistically pooled responses, and groups versus mathematical models. What she concluded from reviewing previous work was that, in the case of groups versus individuals, the groups tended to perform better, as expected. So, what happens when we examine the other possibilities (as well as other possible effects)?
While the case of how the actions of people can influence the decision of the societies as a whole is very interesting (and I will return to it farther ahead), when we talk about the reasoning of a group, this is usually understood to be the first two cases. At this point, we will just discuss the cases where some reasoning is expected from the group, with or without interaction between its members.
After so many disappointments on our individual abilities, it makes sense to start with some good news. More than a hundred years ago, during the West of England Fat Stock and Poultry Exhibition, Francis Galton observed a contest where people attempted to provide the best guess for the weight of a fat ox . Of course, people proposed a range of different values, some close and some very distant from the true value (1,198 pounds). What surprised Galton was the fact that the median of the guesses was actually very close to the correct value at 1207 pounds. Later, he reported the average of the guesses was even closer to the real value, at 1,197 pounds!
This effect, where some average estimate provided by a group of people shows a remarkable agreement with reality was later coined as the Wisdom of Crowds. Galton associated this with the strength of a democratic government, where decisions arise from some kind of averaging over the opinions of many. Of course, the observation of one single case of a group estimate was not enough for a conclusion and several experiments were performed to test how well groups perform. In a 1982 review, Gayle Hill discusses the case of several papers published since Galton's initial observation. In her review, Gayle presented four different comparisons (in all cases, the results for groups included both groups working independently as well as groups where people were allowed to interact with each other): groups versus individuals, groups versus the most competent member, groups versus statistically pooled responses, and groups versus mathematical models. What she concluded from reviewing previous work was that, in the case of groups versus individuals, the groups tended to perform better, as expected. So, what happens when we examine the other possibilities (as well as other possible effects)?
Friday, September 5, 2014
Human Stupidity: Historical: Group Reasoning
It should be clear by now that we should be very careful with any information our minds present to us. While our brains do a good job most of the time, they can easily be fooled and, depending on the circumstances, will fool themselves with no exterior help needed. As I have pointed before, this seems to conflict with all the amazing achievements we, as a species, were able to accomplish.
One possible explanation for this might be in that very phrase. We have accomplished as a species far more than any individual could. Even our greatest genius were able to do their work thanks to the many man who came before them (Newton's claim that he only saw further because he was standing on the shoulders of giants is so well known it has became a common place), in great disagreement with the descriptions of scientists in fictional works. The super genius who can understand anything fast has never existed outside comic books and other sources of entertainment. This suggests that, while we do lack something as individuals, it might be possible that our combined brain powers were responsible for all the advances and explanations we have created.
And, indeed, when observing human history, this seems to be the case. Each scientist contributed with a new piece to the large puzzle, some with larger pieces, some with smaller ones. But many of those pieces only made sense in the context of the knowledge society had at the time. We have new methods of preserving old knowledge. First, for whatever adaptive reason, our ancestors developed our language skills to a level not observed until now in any other species. Later we created ways to preserve that knowledge in permanent materials, through writing and several other information preserving technologies. And we are still creating new ways to do that today. It might seem that while we can be quite flawed as individuals, maybe mankind is much more capable than we are as humans.
This poses a question that deserves a new dive into the literature of psychological experiments: Are group of people better at reasoning and deciding than the individuals? If so, are they always better or that improvement only happens under some conditions?
One possible explanation for this might be in that very phrase. We have accomplished as a species far more than any individual could. Even our greatest genius were able to do their work thanks to the many man who came before them (Newton's claim that he only saw further because he was standing on the shoulders of giants is so well known it has became a common place), in great disagreement with the descriptions of scientists in fictional works. The super genius who can understand anything fast has never existed outside comic books and other sources of entertainment. This suggests that, while we do lack something as individuals, it might be possible that our combined brain powers were responsible for all the advances and explanations we have created.
And, indeed, when observing human history, this seems to be the case. Each scientist contributed with a new piece to the large puzzle, some with larger pieces, some with smaller ones. But many of those pieces only made sense in the context of the knowledge society had at the time. We have new methods of preserving old knowledge. First, for whatever adaptive reason, our ancestors developed our language skills to a level not observed until now in any other species. Later we created ways to preserve that knowledge in permanent materials, through writing and several other information preserving technologies. And we are still creating new ways to do that today. It might seem that while we can be quite flawed as individuals, maybe mankind is much more capable than we are as humans.
This poses a question that deserves a new dive into the literature of psychological experiments: Are group of people better at reasoning and deciding than the individuals? If so, are they always better or that improvement only happens under some conditions?
Monday, August 25, 2014
Human Stupidity: Historical: Memory
Our reasoning and our perception of the world are, as we have seen, far from perfect. While both of them do a good job in many of our every day tasks, they are subject to errors and it is not an overstatement to claim we should be wary of our own conclusions. This imperfection of our cognitive abilities can make us wonder if other functions of our brains suffer from similar problems.
While our emotions lie outside the scope of this work (it is already recognized we can suffer from all kinds of emotional diseases and there is no need to deal with that here), there is another function that we traditionally believe our brains perform well. That function is remembering. People tend to think of their memories as stored boxes they can consult at a later date, providing accurate descriptions of the facts we experienced in our own lives. It might be hard to find a specific memory sometimes and we do worry about forgetting, from simple information we can no longer recall to more serious pathologies where a patient memories can slowly be lost in a permanent way. All this fits well with the information in boxes metaphor, as one can eventually lose those boxes never to find them again. Or lose them for a while, until some new circumstances bring them back to our attention.
Most people, however, do not doubt the contents of their memory. If they do have a memory, unless they suffer from some delusional state, that means they believe things happened exactly they remember. And we trust our memories so completely that we send people to jail every day based only on witness testimonies, that is, on what people remember they saw or heard. And while a lawyer can defend a client by claiming the conditions of the perception of the witness were not good enough, no problem is usually detected on the ability to remember. That is, the legal system understand our perceptions can be flawed and should not be trusted, under the right circumstances. But it assumes that a healthy person will not create false memories or somehow alter the original ones.
This assumption tends to be considered true not just by the layman but also, until recently, by many psychologists. And, as a matter of fact, many practitioners believed (some still do) in the concept of "repressed memory''. That is, an event that a person has experienced in the past and have not really forgotten about. Instead, just the conscious memory is missing, as that event would probably have been very traumatic. Many therapists worked based on the idea that these memories can be recovered through treatment. And that, when these memories are indeed "recovered'', they correspond to actual events in the life of the patient.
The first indication that there was something wrong with this picture came from the unexpectedly large number of cases observed in the 90s where people claimed to have recovered "repressed memories'' of abuses they had suffered. What was particularly suspicious was the fact that the stories those people told often include elements that were supposed to be rare, as, for example, satanic practices. All those cases were recovered under particular types of psychotherapy and, as it should be if those memories were real, arrests and convictions did happen as consequence. The strange number of these cases did make a number of researchers worried that those memories, as vivid and real as they seemed to be to those who had recovered them, might actually be an artifact of the therapy.
Research followed, as it should. In a series of very interesting experiments, Elizabeth Loftus observed she could indeed create false memories in the mind of her subjects. Cases of people who had been wrongly found guilty were later observed, not only related to ``repressed memories'', but also in many cases where the evidence of guilt consisted of witness reports. Simple things like showing pictures of innocent people to a victim could cause that same victim to recognize, later, a man in those pictures as the man who had raped her. It is not clear how many innocent lives were destroyed due to our lack of understanding of how our minds work. Or how many real culprits were not identified by the same problem (for an explanation of the main results of this line of research, there is a very interesting TED lecture).
The image that emerged from those experiments is a different one when compared with previous beliefs. Our memory seems to be much more fluid than any of us would have thought. It is not just that we can suffer from problems with perception. As we learn more about some event, our brains actually change the very recording of that event, so that it will fit with our new beliefs. Missing pieces of information can be obtained from sources as unrelated to the event as a picture one observes later. What we carry in our minds is actually a mixture of what we observed, what we expected to see and things we have experienced or thought later, all mixed ("Memory - like liberty - is a fragile thing'', Elizabeth Loftus}.
In order to finish the topic of our memory, there is an interesting phrase by Steven Novella that he published while discussing the problem of the reliability of our memories in his blog:
While our emotions lie outside the scope of this work (it is already recognized we can suffer from all kinds of emotional diseases and there is no need to deal with that here), there is another function that we traditionally believe our brains perform well. That function is remembering. People tend to think of their memories as stored boxes they can consult at a later date, providing accurate descriptions of the facts we experienced in our own lives. It might be hard to find a specific memory sometimes and we do worry about forgetting, from simple information we can no longer recall to more serious pathologies where a patient memories can slowly be lost in a permanent way. All this fits well with the information in boxes metaphor, as one can eventually lose those boxes never to find them again. Or lose them for a while, until some new circumstances bring them back to our attention.
Most people, however, do not doubt the contents of their memory. If they do have a memory, unless they suffer from some delusional state, that means they believe things happened exactly they remember. And we trust our memories so completely that we send people to jail every day based only on witness testimonies, that is, on what people remember they saw or heard. And while a lawyer can defend a client by claiming the conditions of the perception of the witness were not good enough, no problem is usually detected on the ability to remember. That is, the legal system understand our perceptions can be flawed and should not be trusted, under the right circumstances. But it assumes that a healthy person will not create false memories or somehow alter the original ones.
This assumption tends to be considered true not just by the layman but also, until recently, by many psychologists. And, as a matter of fact, many practitioners believed (some still do) in the concept of "repressed memory''. That is, an event that a person has experienced in the past and have not really forgotten about. Instead, just the conscious memory is missing, as that event would probably have been very traumatic. Many therapists worked based on the idea that these memories can be recovered through treatment. And that, when these memories are indeed "recovered'', they correspond to actual events in the life of the patient.
The first indication that there was something wrong with this picture came from the unexpectedly large number of cases observed in the 90s where people claimed to have recovered "repressed memories'' of abuses they had suffered. What was particularly suspicious was the fact that the stories those people told often include elements that were supposed to be rare, as, for example, satanic practices. All those cases were recovered under particular types of psychotherapy and, as it should be if those memories were real, arrests and convictions did happen as consequence. The strange number of these cases did make a number of researchers worried that those memories, as vivid and real as they seemed to be to those who had recovered them, might actually be an artifact of the therapy.
Research followed, as it should. In a series of very interesting experiments, Elizabeth Loftus observed she could indeed create false memories in the mind of her subjects. Cases of people who had been wrongly found guilty were later observed, not only related to ``repressed memories'', but also in many cases where the evidence of guilt consisted of witness reports. Simple things like showing pictures of innocent people to a victim could cause that same victim to recognize, later, a man in those pictures as the man who had raped her. It is not clear how many innocent lives were destroyed due to our lack of understanding of how our minds work. Or how many real culprits were not identified by the same problem (for an explanation of the main results of this line of research, there is a very interesting TED lecture).
The image that emerged from those experiments is a different one when compared with previous beliefs. Our memory seems to be much more fluid than any of us would have thought. It is not just that we can suffer from problems with perception. As we learn more about some event, our brains actually change the very recording of that event, so that it will fit with our new beliefs. Missing pieces of information can be obtained from sources as unrelated to the event as a picture one observes later. What we carry in our minds is actually a mixture of what we observed, what we expected to see and things we have experienced or thought later, all mixed ("Memory - like liberty - is a fragile thing'', Elizabeth Loftus}.
In order to finish the topic of our memory, there is an interesting phrase by Steven Novella that he published while discussing the problem of the reliability of our memories in his blog:
"When someone looks at me and earnestly says, "I know what I saw,'' I am fond of replying, "No you don't.'' You have a distorted and constructed memory of a distorted and constructed perception, both of which are subservient to whatever narrative your brain is operating under." Extracted from here.
Thursday, July 3, 2014
Human Stupidity: Historical: Visual Illusions
Almost everyone has seen pictures that deceive our eyes in some way. Some of them have two possible interpretations, others make us evaluate wrongly the size or the alignment of geometric figures. More complex figures can induce the illusion of movement when no actual movement is happening. And yet, illusion is a concept that is actually hard to define from a philosophical point of view, since it requires comparison with the true nature of the object, something we would tend to define as perceived by our senses. The number of different illusions and the way they work is actually so large that systematizing them into a few types or a theoretical framework has proved to be a surprisingly hard task.
The way our brain interprets the information it receives from our eyes can be considered similar to the way we reason. The task is indeed similar. Given what we know, the brain tries to arrive at the best possible conclusion. It uses heuristics and rules we still are starting to understand. These heuristics are usually good for solving some set of problems, either problems our ancestors had to deal with (get food, find a mate, etc.) or problems we learned to solve during our life time. The same way our brains have to deal with images. Given the visual information our eyes receive, our brain tries its best to interpret what exists in the world around us. It extrapolates and reaches conclusions that are not conscious, simply providing us with its best guess. And, most of the time, that guess is remarkably good.
Just as we discussed before, the fact we sometimes make mistakes of interpretation of visual information is not necessarily a bad thing for our survival. Recognition of patterns, whether those patterns emerge in the financial market or are the behaviour of the game one is hunting, is a very useful skill. And if one is the first to identify it, there is more to gain. This can be enough to compensate for the cost of false detections. And, indeed, in general reasoning as well as in interpreting visual information, we are able to identify patterns very fast, which leads to falsely identifying random meaningless noise with something important. This general phenomenon is called apophenia.
One interesting and helpful example of how this applies to our visual perception is our tendency to identify faces everywhere, from simple typographical juxtaposition of characters like :) or ;-( to seeing faces on rocks or on toasts or on shadowy, blurred images from Mars. This is called pareidolia. Quickly identifying other people as well as inferring their emotional state is certainly an useful trait for a social animal like humans are (see Demon-Haunted World: Science as a Candle in the Dark).
While this can be interesting and allows us to create new ways to communicate and to give extra meaning to some forms of art, the reality is that much of what we see as faces is a probably hard-wired conclusion of our brains. Indeed, evidence from MRI scanning of our brains show that the specific areas of the cortex that become more active when we see faces also show the same type of activity when we just perceive something as a face . The timing of the activity is also consistent with an early interpretation of the image as an actual face and not a later re-interpretation of the image by our brains. Amazingly, it is already possible to do neural reconstruction of the face someone is seeing from the detected pattern of the activity of the brain.
Saturday, June 28, 2014
Human Stupidity: Historical: Heuristics II
Heuristics, in the context of the literature about human reasoning, is a fancier name for these rules of thumb. Before that, the term, based on a Greek word, was used by Pólya in his book How to Solve It, with the meaning of his methods (or advices) for solving mathematical problems. In that sense, he proposed a basic separation of four steps that he considered helpful in finding those solutions. While talking about human reasoning, however, heuristics is a simple rule (or rules) we use to guess an answer. Opposed to a mathematical proof, here, there is no guarantee the answer will be correct or good. Of course, Pólya's heuristics provide no certainty that you will arrive at an answer, they just intend to help your chances of finding it. If you do not make any mistakes, mathematics will make it sure you do get the right answer, then. While human reasoning heuristics will often give you an answer, even if not the correct one.
A classical example of how our heuristics can help us reason is the problem of trying to decide which of two cities has the biggest population. Gigerenzer and Goldstein performed a series of tests of possible procedures for guessing between two cities, when using a number of cues about the city, such as if the city had a soccer team in a major league or if the city had a university.
The researchers were interested in comparing how heuristic reasoning would compare against statistical models they considered as rational. They tested different methods for making predictions from the cues, namely, multiple regression and neural networks. To their surprise, even without using all the information, some of their simulated heuristics were often able to outperform the supposedly rational models where all the available information was used.
The "Take the best'' heuristic was particularly successful, despite its simplicity. It basically orders the cues from more informative to less and then uses the best one where information is available. The simulations included the possibility a simulated agent might not know enough to use the best cue available, forcing the agent to check the next cue. As soon as one cue provides any evidence to which city might be the biggest, "Take the best'' uses that cue and just ignore any information from the other cues.
We observe here an effect that seems similar to how humans accuracy decreases with more information. However, we can not actually make that claim, since we don't know the exact reason for the human mistakes. In the case of multiple regression models, on the other hand, the reason is clear. While it is quite surprising at first that using less information might be better when using a statistical model, it is a known problem that statistical models that use many variables can overfit the data. This phenomenon will be further discussed later, when we discuss inductive logic and the use of probabilistic models. We will see how good prediction requires the use of models that both fit the data well and are as simple as possible.
Thursday, May 22, 2014
Human Stupidity: Historical: Heuristics
Humans are considered (by humans) as
the most intelligent species known (to mankind). And, when we observe
how much we have been able to accomplish as a species and compare
that to every other species on Earth, that statement makes a lot of
sense. One could debate if some big brained animals might be
individually as intelligent as an individual human (he
question now seems far less absurd than decades ago, as we learn more
and more about the abilities of some animals and our own
shortcomings). But there is no denying that what we have achieved as
a species is without precedents. We have vehicles exploring the deep
ocean and other planets, while others are leaving the Solar System;
we can communicate almost instantaneously around our world and we
understand the world around us in ways that a few generations ago
wouldn't even dream. We have been changing the appearance of our
planet (for good and also for evil) in a scale not done by an any
organism, probably since the appearance of the first plants that
could photosynthesize (The oxygen they started producing,
while vital for us, was certainly a pollution for most organisms that
lived then and must have caused widespread death among the species
that didn't adapt to the new environment, much like the widespread
death we are causing. Polluting and killing is not our exclusivity at
all). And, for the first time since life started on Earth, we have
been able to subvert most of the survival rules that apply to other
species, changing how evolution applies to us by making it possible
for even the some of the weakest among our species to survive and
reach an old age, safe from the dangerous and fatal natural
environment.
Those are very impressive
accomplishments and they do give us the sense that, while we are far
from perfect, or even far from good enough, we have been able to do
something right. Culturally, we even see ourselves as something apart
from the natural world, as if we were somehow superior to nature and
not just a very successful species of big apes. While the distinction
between natural and artificial makes no sense (one might be tempted
to say it is completely artificial), it does reflect the fact that we
have, in the local scale, subverted the relation we have with the
world around us. And, while there are many reasons to worry about the
future, our present is actually almost unbelievably better than we
our perception of it. Violent deaths have never been so rare, humans
never lived such long lives, all due to the advances in science and
in our cultural and political institutions as shown recently by Pinker.
The data
that show this to be a fact are not so hard to find and we only feel
we are surrounded by violence and disasters as an effect of the news
focusing on those events. And, since information circulates much
better now, we can learn about almost any disaster in the planet.
With billions alive, the total number of crimes and disasters is
indeed large. Not only we can learn about natural disasters happening
at the other side of the globe, now it is very likely that there are
people living there who will be affected by it. But what really
matters to any of us as individuals is the proportion of people who
die or who suffer, not the total number that happens in a larger
population and, much less, the total number of cases we can find in
the Internet. What matters is the probability that a given tragedy
will affect one person. And these probabilities have been steadily
going down (with the important exception of the ills
associated with old age, as, in the old days, they were quite rare,
since basically nobody reached old age), to the point that, even
without ever seeing the data, I would personally bet that the life
expectation of an Egyptian pharaoh was much smaller than that of a
poor and discriminated person, as, per example, a black poor woman
living in a crime infested slum in Brazil. That this statement can be
surprising to so many is just a consequence of the many problems with
our reasoning.
So, what is actually happening? Are we
completely stupid incompetents or are we incredible geniuses who
mastered the secrets of the Universe and changed the world into a
utopia? The answer is clearly that we are neither, even though there
is some truth to the notion that we are very dumb and also to the
notion that we are actually living in a Golden Age of mankind.
One first partial answer to the
question of how we (or any other living being) can actually achieve
so much while being quite dumb was suggested by Simon, in 1956. In his paper, Simon investigated if it
was actually necessary for a living organism to have a well defined
utility function as proposed by the EUT, as well as the intellectual
capacity to analyze its environment and make the decisions that
maximize that utility. Organisms need to find ways to deal with a
multitude of different tasks, from feeding, to defending itself and
reproducing if the species is to survive. Actually obtaining and
interpreting all available data from observing its surroundings and
choosing the best way to obtain the best possible outcome, when all
those tasks are considered, is basically an impossible problem. It
would require a mental capacity far beyond the one we possess and
this basically infinite capacity would also need to happen very fast.
You really don't want to sit and think what is the best choice when a
lion is closing to you. Since finding the perfect answer is not
achievable, organisms had to settle for less.
Assume there are a number of clues in
the environment that you could use in a simple way to make some
decision. If this decision will give you a better chance to survive
than not using those clues, any organism that uses those clues will
have an advantage when compared to organisms who don't (as long as
processing this information does not consume so much energy that the
benefit is smaller than the cost, of course). So, an organism does
not need to find the optimum, or, in economic terms, to optimize its
utility. It can actually function competently by finding efficient,
but not necessarily error-proof, ways to interpret the information
captured by its senses. Simon describing this non-optimal behavior as
satisficing (Evolution does not requires any species to be
the best to survive. Being better than the others would be
sufficient, but even being better might not be a good strategy. The
real concept is better adapted. Not stronger, or faster, or smarter,
sometimes, being weaker can actually mean better adapted. In an
environment with scarce resources, being too big and strong might
require extra food that is not available. In this case, the weaker
organisms, who are able to survive with less, are the best adapted to
that environment. This applies to strength, but also to speed, to
mental prowess or any other characteristic.).
That is, if simple rules of thumb make
you more likely to survive, it makes sense to use them. Per example,
if you are looking for the cause of a phenomenon, it makes sense to
look for things that happen together with it. After all, if it is the
cause, you do expect those things to be related. The fact that many
variables can be associated with no causal connection means you will
often believe that things are related when they are not.
Suppose you are belong to a family of
farmers without any of our modern knowledge. You try to plant your
seeds and sometimes things go well and the climate seems to be
working in your favor. At other times, it gets cold too soon, or
there is not enough water for your plants to grow. After a long time
observing, your grandfather observed if he planted the seeds whenever
a specific bright star appeared low on the sky just when the Sun went
down, the climate would be right for the plant to grow. Your parents
confirmed it as well as your own experience. So, you conclude that
this star commands the success of your farming. While this conclusion
is wrong, there is no cause there, the observation of movement of the
stars is indeed associated with the calendar and the seasons. And
your decision will indeed be better. If you extend the argument to
the belief that the same star will influence the chance of your
success in war, you will be very wrong. But, without better
information, there is no way you can actually determine the better
day to go to war. Going when you believe the stars support you is a
costless mistake, from an evolutionary point of view, since it does
not improve or decreases your chance of success.
Mistaking association for cause is
indeed an incredibly common mistake. My own personal experience with
association and causation is actually quite worrisome. I am used to
telling my students that their exams are very likely to include a
question where variables will be associated and I will ask about
causes. And I make it abundantly clear, with examples and theory,
that observational studies (I will define these later in
this text), one can not conclude that there is cause and effect. And
yet, a large percentage of these students make this very same mistake
during the exams (of course, this might be related to the
fact that I tell my students that, if they do not show up for class
but succeed at the exam, I will give them the minimum required
attendance, so it is possible that the students who make that mistake
were not at those three or four classes when I tell them one of the
exam questions. But my best guess is that it is not just that).
Outside of the exams, this can be a low cost mistake, so, it is a
reasonable rule of thumb, despite the fact that is is logically
wrong.
Friday, March 28, 2014
Human Stupidity: Historical: Control Issues
Besides all the errors we have seen so far, humans seem to have an innate ability to believe they are in control, even when that is not true, nor even possible. In 1975, Langer and Roth tested people on whether they felt they would be able to predict the outcome of random coin tosses. They rigged the outcome in such way that all participants would get the same number of correct guesses. The main difference was that the order of the correct outcomes was different, with three groups. For some of the subjects, those correct predictions would happen more often at their first attempts; the second group experienced a stable rate of success; and the third group started getting more wrong answers at first and more correct ones at the end. Consistent with the primacy effect, those who had obtained their correct answers sooner considered themselves more skilled than those who had observed more correct answers later. That was despite the fact the percentage of hits was the same for all involved. The confidence on their skills was not related with how successful they had been in the overall task, but just with how well they had performed in the beginning.
However, not everyone who participated in the experiment was asked to make predictions. A number of people were just instructed to observe the ones who were making the predictions and evaluate their skill at the task. Those who just observed evaluated the overall skill of the guessers as worse than the guessers evaluated themselves. Being in control had an effect on how people seemed to report the skill.
Interestingly, despite being clear the subjects had no influence on the outcome, those people who felt they were more skilled at predicting the outcomes would, after a while, start attributing their correct answers to their ability, while the wrong ones were blamed on chance (Anyone who has taught courses and graded the exams of their students can probably observe this effect. Many students seem to honestly (and absurdly) believe in the combination that any success in the classroom is due to their merit, while failures are to blame on the teacher, or study conditions, anything but themselves). And their false belief in their merit extend to how they evaluated different aspects of the problem. Both guessers and observers assumed that, if the guesser had the opportunity to train for that task, he would improve his performance. And they seriously felt that the existence of distractions would cause them to obtain a smaller number of correct results.
This illusion that we have some degree of control even when the task is completely random has been observed in several different tasks since these results. Pronin et al observed how this illusion of control is related to magical thinking, by making people actually believe that they have harmed others through a voodoo hex, especially when they have harboured evil thoughts about the victim or that they could influence the outcome of a basketball game by positive visualizations of their success (t should be unnecessary to say both effects are completely false, but unfortunately, this comment is very much needed). And, while failure at predicting sport events might be, for most people (except, of course, for betters), the same illusion can serious consequences in other areas. Some of those consequences might even be positive, since being in control can be related with feeling better. But this can also lead to bad decisions in all areas of human enterprise. For example, Odean discusses the consequences on the behaviour of prices of the fact traders are overconfident about their abilities and the control they actually have on the outcome of their investments. And I have often observed (and I am sure most readers have also) how people believe that their actions, sometimes just their intent, would actually influence outcomes that are mostly random.
But do not despair yet, dear reader. While the number of studies that show our mistakes is staggering, I believe I have been able to convince many of you of how we can not trust our own intuitions(and such a belief is almost certainly my own illusion of control that I have more influence on how you think than I actually have). As such, we will proceed now to more optimistic waters, first taking a cursory view on the explanations of why it is possible that we are so incompetent (we are not really incompetent, we are just far less competent than we would like to believe). And later ahead, we will ask the important question of how we can actually do better and try to avoid the many pitfalls our brains have in store for us.
However, not everyone who participated in the experiment was asked to make predictions. A number of people were just instructed to observe the ones who were making the predictions and evaluate their skill at the task. Those who just observed evaluated the overall skill of the guessers as worse than the guessers evaluated themselves. Being in control had an effect on how people seemed to report the skill.
Interestingly, despite being clear the subjects had no influence on the outcome, those people who felt they were more skilled at predicting the outcomes would, after a while, start attributing their correct answers to their ability, while the wrong ones were blamed on chance (Anyone who has taught courses and graded the exams of their students can probably observe this effect. Many students seem to honestly (and absurdly) believe in the combination that any success in the classroom is due to their merit, while failures are to blame on the teacher, or study conditions, anything but themselves). And their false belief in their merit extend to how they evaluated different aspects of the problem. Both guessers and observers assumed that, if the guesser had the opportunity to train for that task, he would improve his performance. And they seriously felt that the existence of distractions would cause them to obtain a smaller number of correct results.
This illusion that we have some degree of control even when the task is completely random has been observed in several different tasks since these results. Pronin et al observed how this illusion of control is related to magical thinking, by making people actually believe that they have harmed others through a voodoo hex, especially when they have harboured evil thoughts about the victim or that they could influence the outcome of a basketball game by positive visualizations of their success (t should be unnecessary to say both effects are completely false, but unfortunately, this comment is very much needed). And, while failure at predicting sport events might be, for most people (except, of course, for betters), the same illusion can serious consequences in other areas. Some of those consequences might even be positive, since being in control can be related with feeling better. But this can also lead to bad decisions in all areas of human enterprise. For example, Odean discusses the consequences on the behaviour of prices of the fact traders are overconfident about their abilities and the control they actually have on the outcome of their investments. And I have often observed (and I am sure most readers have also) how people believe that their actions, sometimes just their intent, would actually influence outcomes that are mostly random.
But do not despair yet, dear reader. While the number of studies that show our mistakes is staggering, I believe I have been able to convince many of you of how we can not trust our own intuitions(and such a belief is almost certainly my own illusion of control that I have more influence on how you think than I actually have). As such, we will proceed now to more optimistic waters, first taking a cursory view on the explanations of why it is possible that we are so incompetent (we are not really incompetent, we are just far less competent than we would like to believe). And later ahead, we will ask the important question of how we can actually do better and try to avoid the many pitfalls our brains have in store for us.
Tuesday, March 11, 2014
Human Stupidity: Historical: Calibration
The question of how well a person knows her real chance to get an answer right is called calibration. In general terms, a person who is 95% sure he got the correct answer, is expected to be correct 95% of the times. If he only answers correctly 70% of those questions, we can say that this person is not well calibrated on how well he knows what he knows. However, for one given question and one specific person, the answer will either be right or wrong. That means that some caution must be taken when measuring actual accuracy. Different studies can actually provide different answers depending on how the term is actually defined. This means that some discrepancy in the results and the explanations given by each author is to be expected. And, while that is indeed the case, the amount of evidence on the existence of problems with how well calibrated we tend to be is very strong.
An important question is, therefore, when should we expect to observe problems in calibration and when not. Griffin and Tversky (or also Chapter 13 in Heuristics and Biases: The Psychology of Intuitive Judgment) observed in 1992 that people seem to account wrongly for different statistical information that they call weight and strength of the evidence (personally, I find this terminology confusing, as the statistical meaning of the terms is not very clear from the names. But it is a standard way of speaking in the area) Basically, the strength of the evidence would be the proportion that was observed and the weight, the size of the sample. That is, if you toss a biased coin 20 times and obtains 16 heads, the strength of the observation is the fact that you observed heads 80% of the times, while the weight of the evidence is the fact that this was observed over 20 tosses. Both pieces of information must be used in any attempt to predict whether the coin is actually biased towards heads, as well as how likely it is that we would get heads if we toss the coin once more. However, what Griffin and Tversky observed was that, while basically accounting correctly for the observed proportion (strength), people did not take into account the weight of the data (sample size) correctly.
Quite interestingly enough, they comment, among other things, on the the problem of ``illusion of validity'', term coined by Kahneman and Tversky in 1973 (also in Chapter 4 in Judgment under Uncertainty: Heuristics and Biases This effect can be described as the fact that different questions produce different measurements of calibration. More exactly, what was observed was that people have a tendency to be more overconfident about individual cases than about their overall accuracy. For example, Cooper et al, while interviewing almost 3,000 entrepreneurs, observed that they were widely overconfident about the chance of success of their own business. On the other hand, when asked about the chance of success of a generic enterprise in their area, that overconfidence was much smaller and they proved to be just moderately overconfident. This is something that would actually be expected, due to an observation bias effect. Even if there was no average overconfidence among people on the success of businesses, some amount of random error would be unavoidable. That is, any entrepreneur was was well calibrated, in average, could show some overconfidence in some of the business area and underconfidence in others. Of course, entrepreneurs who evaluated an area as more likely to succeed would be expect to invest more in that area. And this overconfidence was not associated with those who were better prepared or actually had a better chance to succeed than their competition. What they observed was the poorly prepared entrepreneurs showed the same optimism than the better prepared ones (perhaps another example of the curse of the incompetent).
But not only calibration problems are dependent on what people are trying to answer, they are also not observed in every situation. Actually, Lichtenstein and Fischhoff observed that people can be trained. In an experiment where people had to distinguish if one phrase had been handwritten by and American or an European, they observed that, simply by providing a basic initial training, their subjects not only got more questions correctly, but also showed a better calibration about their evaluations. In his book The Psychology of Judgment and Decision Making (McGraw-Hill Series in Social Psychology), Plous reviews and compares the results of studies of calibration in two different areas, one in predicting meteorological events, by Murphy and Winkler and the other about physicians estimating the probability of a given patient to have pneumonia, by Christensen-Szalansk and Busyhead. And, contrary to popular culture assessments, the meteorologists proved to be quite well calibrated, while the physicians showed an absurd amount of overconfidence. An important part of what seems to be happening is that meteorologists get much more feedback about the accuracy of their predictions than physicians do. As a matter of fact, Lichtenstein and Fischhoff , in another study,
observed that, after some training where they provided feedback on how accurate people were on their answers, almost all their subjects improved their calibration. The exception was, actually, the few individuals who were already well calibrated before the training. This seems to make it clear the incredible importance of getting feedback on how precise one predictions were.
An important question is, therefore, when should we expect to observe problems in calibration and when not. Griffin and Tversky (or also Chapter 13 in Heuristics and Biases: The Psychology of Intuitive Judgment) observed in 1992 that people seem to account wrongly for different statistical information that they call weight and strength of the evidence (personally, I find this terminology confusing, as the statistical meaning of the terms is not very clear from the names. But it is a standard way of speaking in the area) Basically, the strength of the evidence would be the proportion that was observed and the weight, the size of the sample. That is, if you toss a biased coin 20 times and obtains 16 heads, the strength of the observation is the fact that you observed heads 80% of the times, while the weight of the evidence is the fact that this was observed over 20 tosses. Both pieces of information must be used in any attempt to predict whether the coin is actually biased towards heads, as well as how likely it is that we would get heads if we toss the coin once more. However, what Griffin and Tversky observed was that, while basically accounting correctly for the observed proportion (strength), people did not take into account the weight of the data (sample size) correctly.
Quite interestingly enough, they comment, among other things, on the the problem of ``illusion of validity'', term coined by Kahneman and Tversky in 1973 (also in Chapter 4 in Judgment under Uncertainty: Heuristics and Biases This effect can be described as the fact that different questions produce different measurements of calibration. More exactly, what was observed was that people have a tendency to be more overconfident about individual cases than about their overall accuracy. For example, Cooper et al, while interviewing almost 3,000 entrepreneurs, observed that they were widely overconfident about the chance of success of their own business. On the other hand, when asked about the chance of success of a generic enterprise in their area, that overconfidence was much smaller and they proved to be just moderately overconfident. This is something that would actually be expected, due to an observation bias effect. Even if there was no average overconfidence among people on the success of businesses, some amount of random error would be unavoidable. That is, any entrepreneur was was well calibrated, in average, could show some overconfidence in some of the business area and underconfidence in others. Of course, entrepreneurs who evaluated an area as more likely to succeed would be expect to invest more in that area. And this overconfidence was not associated with those who were better prepared or actually had a better chance to succeed than their competition. What they observed was the poorly prepared entrepreneurs showed the same optimism than the better prepared ones (perhaps another example of the curse of the incompetent).
But not only calibration problems are dependent on what people are trying to answer, they are also not observed in every situation. Actually, Lichtenstein and Fischhoff observed that people can be trained. In an experiment where people had to distinguish if one phrase had been handwritten by and American or an European, they observed that, simply by providing a basic initial training, their subjects not only got more questions correctly, but also showed a better calibration about their evaluations. In his book The Psychology of Judgment and Decision Making (McGraw-Hill Series in Social Psychology), Plous reviews and compares the results of studies of calibration in two different areas, one in predicting meteorological events, by Murphy and Winkler and the other about physicians estimating the probability of a given patient to have pneumonia, by Christensen-Szalansk and Busyhead. And, contrary to popular culture assessments, the meteorologists proved to be quite well calibrated, while the physicians showed an absurd amount of overconfidence. An important part of what seems to be happening is that meteorologists get much more feedback about the accuracy of their predictions than physicians do. As a matter of fact, Lichtenstein and Fischhoff , in another study,
observed that, after some training where they provided feedback on how accurate people were on their answers, almost all their subjects improved their calibration. The exception was, actually, the few individuals who were already well calibrated before the training. This seems to make it clear the incredible importance of getting feedback on how precise one predictions were.
Thursday, March 6, 2014
TED Talks on Irrationality
I just found a very interesting series of videos from the TED Talks people. It is a playlist entitled "Our brains: predictably irrational".
I haven't watched any of those yet, but they certainly are in my to do list. I hope we all enjoy it.
Friday, February 28, 2014
Human Stupidity: Historical: Overconfidence
Despite all the ever mounting evidence on what is really happening, we are still very confident on our intellectual abilities. And some of the confidence seems justified since, as a species, we have been able to send robots to Mars, among many other astonishing achievements. In some sense, our confidence in our abilities should be correct, at least, that seems to make sense. And yet, our common sense, as we have seen, is not something we can really rely on. A question that arises naturally from these facts is how much we can be really sure of something when we feel confident about. And, again, experimental results show we are once more in trouble, most of the time, when we compare our confidence with the accuracy of our judgments.
In 1965 Oskamp performed a series of experiments, trying to measure if confidence and accuracy in the evaluations were connected as they should. We would like to believe that, when we are more sure about something, the chance of being right should improve. Oskamp tested a group that included clinical psychologists with several years of experience, Psychology graduate students and advanced undergraduate students. The task they had to perform was to evaluate the personality of Joseph Kidd (The judges had access only to written data about him, and more data was provided at each stage of the experiment) as well as to predict his attitudes and typical actions. At the first stage, just a general demographic description of Kidd was provided and, at each stage, the judges received a page or two about a period of the patient life (childhood, high school and college years, and military service and later). And, after each stage, the judges had to provide their best answer to the same series of 25 questions, as well as to evaluate how sure they were they had chosen the right answer. Each question was presented as a multiple choice problem with five alternatives to choose from.
What Oskamp observed was that the task was actually a hard one, given the amount of data the judges received, with none of the judges ever getting to 50% correct answers. More than that, the final average level of accuracy was actually 28%, not much different from random chance (20%) (statistically, the difference was not significant). This could be just attributed to the lack of data and difficulty of the questions, of course. What was really disturbing was that, while the accuracy seemed just to oscillate from Stage 1 to Stage 4 (26%, 23%, 28.4%, and 27.8%), the confidence of the judges showed a clear steady increase (33.2%, 39.2%, 46%, and 52.8%). That is, the extra data didn't contribute to the judges getting right answers, but it did make them more confident at their quite often wrong evaluations!
Despite the oscillations in the accuracy, while the accuracy percentages observed by Oskamp do not show there was indeed an increase with the extra information, it is at least possible that there might have been a very small improvement. But more recent studies have shown that not even that is always true. By asking people to predict the results of basketball games, Hall et al tested the accuracy of those predictions by dividing the participants in the study in two groups. Both groups received the same statistical information about the teams playing (win record, halftime score). The second group was also informed the names of the teams playing, information that was withheld from the people in the first group. What they observed was that the second group, with the extra information, consistently made worse predictions, typically by choosing better known teams and disregarding the statistical evidence. That result was repeated even when there was monetary bets on which team would win. And yet, people evaluated that knowing the names actually helped them making those predictions. Clearly, the new information increased the confidence, while decreasing the accuracy of the people involved.
That extra information can lead to overconfidence was confirmed in other experiments, as the ones made by Tsai et al. They also asked participants to preditc the outcome of games, this time American football games. And they presented performance statistics of the teams (not identified by names), one at a time. What they observed was that accuracy did get better with the first pieces of information, basically for the 6 first cues that were provided. At the same time, confidence also increased. However, as more cues were provided, up to 30 values, accuracy did not improve, but confidence did. The authors observed that, if all the information had actually been used in an optimal way, the accuracy could have improved together with the confidence, basically in a way that was equivalent to the observed increase in confidence. But the extra information, apparently, was not used to make a better prediction. One possible explanation from the authors was that people might not correct their confidence estimates to account for their limited capacity of analysis, becoming more and more certain despite the fact they were no longer improving. Overconfidence was also observed in other areas, such as how sure teachers are about the evaluation they make of their students potential or consumer knowledge \cite{albahutchinson00a}.
Overconfidence, however, is not something that is observed every time. Lichtenstein and Fischhoff observed that, as accuracy gets larger and larger, that is, when people actually know the subject and, therefore, get the answer right more often, the overconfidence starts to diminish. And, as a matter of fact, as people start getting more than 80% of the questions right, overconfidence is often substituted by underconfidence.
This does not mean that when people report their confidence to be higher than 80%, they are likely to be underconfident, however. There is here one subtle, but extremely important difference in the conditionals. What that result says is the opposite condition, that, while observing situations where people get more than 80% of the answers correctly, there is a tendency for underconfidence. But, in many situations, we might have an expert providing us with her confidence and we would like to have an estimate of its accuracy. Fischhoff et al investigated what happens in the case where people report high certainty about their evaluations. What they observed was that when people stated they were 99% sure of their answers, they actually answered correctly between 73% and 87%, depending on the experiment. Even when people were so certain that they considered there was just one chance in a million that they would be wrong (0.0001%), they were actually wrong from 4% to 10% of the times.
Why we are so bad at estimating how much we know is not clear. Dunning et all in an article aptly named "Why People Fail to Recognize Their Own Incompetence" proposed that, for the case of incompetence, that is, low accuracy, there might be a double curse: incompetent people might be both incompetent enough to know the answer and to know that they don't know. But, even if this is the case (their idea does bring the name of a few people to mind), it does not really explain the whole range of observations.
Note: In my last entry, I said it might take me just a few more days to post something new. Maybe I was overconfident at predicting my capacity to get it done. And also, I didn't include the possibility of a rearrangement on what I had planned. The text I was working, I decided that it will fit better as a later entry, in other part of this writings. At least, when I get there, I already have 80% of a post ready.
In 1965 Oskamp performed a series of experiments, trying to measure if confidence and accuracy in the evaluations were connected as they should. We would like to believe that, when we are more sure about something, the chance of being right should improve. Oskamp tested a group that included clinical psychologists with several years of experience, Psychology graduate students and advanced undergraduate students. The task they had to perform was to evaluate the personality of Joseph Kidd (The judges had access only to written data about him, and more data was provided at each stage of the experiment) as well as to predict his attitudes and typical actions. At the first stage, just a general demographic description of Kidd was provided and, at each stage, the judges received a page or two about a period of the patient life (childhood, high school and college years, and military service and later). And, after each stage, the judges had to provide their best answer to the same series of 25 questions, as well as to evaluate how sure they were they had chosen the right answer. Each question was presented as a multiple choice problem with five alternatives to choose from.
What Oskamp observed was that the task was actually a hard one, given the amount of data the judges received, with none of the judges ever getting to 50% correct answers. More than that, the final average level of accuracy was actually 28%, not much different from random chance (20%) (statistically, the difference was not significant). This could be just attributed to the lack of data and difficulty of the questions, of course. What was really disturbing was that, while the accuracy seemed just to oscillate from Stage 1 to Stage 4 (26%, 23%, 28.4%, and 27.8%), the confidence of the judges showed a clear steady increase (33.2%, 39.2%, 46%, and 52.8%). That is, the extra data didn't contribute to the judges getting right answers, but it did make them more confident at their quite often wrong evaluations!
Despite the oscillations in the accuracy, while the accuracy percentages observed by Oskamp do not show there was indeed an increase with the extra information, it is at least possible that there might have been a very small improvement. But more recent studies have shown that not even that is always true. By asking people to predict the results of basketball games, Hall et al tested the accuracy of those predictions by dividing the participants in the study in two groups. Both groups received the same statistical information about the teams playing (win record, halftime score). The second group was also informed the names of the teams playing, information that was withheld from the people in the first group. What they observed was that the second group, with the extra information, consistently made worse predictions, typically by choosing better known teams and disregarding the statistical evidence. That result was repeated even when there was monetary bets on which team would win. And yet, people evaluated that knowing the names actually helped them making those predictions. Clearly, the new information increased the confidence, while decreasing the accuracy of the people involved.
That extra information can lead to overconfidence was confirmed in other experiments, as the ones made by Tsai et al. They also asked participants to preditc the outcome of games, this time American football games. And they presented performance statistics of the teams (not identified by names), one at a time. What they observed was that accuracy did get better with the first pieces of information, basically for the 6 first cues that were provided. At the same time, confidence also increased. However, as more cues were provided, up to 30 values, accuracy did not improve, but confidence did. The authors observed that, if all the information had actually been used in an optimal way, the accuracy could have improved together with the confidence, basically in a way that was equivalent to the observed increase in confidence. But the extra information, apparently, was not used to make a better prediction. One possible explanation from the authors was that people might not correct their confidence estimates to account for their limited capacity of analysis, becoming more and more certain despite the fact they were no longer improving. Overconfidence was also observed in other areas, such as how sure teachers are about the evaluation they make of their students potential or consumer knowledge \cite{albahutchinson00a}.
Overconfidence, however, is not something that is observed every time. Lichtenstein and Fischhoff observed that, as accuracy gets larger and larger, that is, when people actually know the subject and, therefore, get the answer right more often, the overconfidence starts to diminish. And, as a matter of fact, as people start getting more than 80% of the questions right, overconfidence is often substituted by underconfidence.
This does not mean that when people report their confidence to be higher than 80%, they are likely to be underconfident, however. There is here one subtle, but extremely important difference in the conditionals. What that result says is the opposite condition, that, while observing situations where people get more than 80% of the answers correctly, there is a tendency for underconfidence. But, in many situations, we might have an expert providing us with her confidence and we would like to have an estimate of its accuracy. Fischhoff et al investigated what happens in the case where people report high certainty about their evaluations. What they observed was that when people stated they were 99% sure of their answers, they actually answered correctly between 73% and 87%, depending on the experiment. Even when people were so certain that they considered there was just one chance in a million that they would be wrong (0.0001%), they were actually wrong from 4% to 10% of the times.
Why we are so bad at estimating how much we know is not clear. Dunning et all in an article aptly named "Why People Fail to Recognize Their Own Incompetence" proposed that, for the case of incompetence, that is, low accuracy, there might be a double curse: incompetent people might be both incompetent enough to know the answer and to know that they don't know. But, even if this is the case (their idea does bring the name of a few people to mind), it does not really explain the whole range of observations.
Note: In my last entry, I said it might take me just a few more days to post something new. Maybe I was overconfident at predicting my capacity to get it done. And also, I didn't include the possibility of a rearrangement on what I had planned. The text I was working, I decided that it will fit better as a later entry, in other part of this writings. At least, when I get there, I already have 80% of a post ready.
Monday, February 10, 2014
I am still working
Hi,
Just to let you people know that I am still working on the next entry. Between being forced to deal with some human stupidity of the bad kind in real life and the fact next entry is requiring far more article hunting work than I expected, I am late. Still, it should be out in a matter of days.
Just to let you people know that I am still working on the next entry. Between being forced to deal with some human stupidity of the bad kind in real life and the fact next entry is requiring far more article hunting work than I expected, I am late. Still, it should be out in a matter of days.
Wednesday, January 22, 2014
Human Stupidity: Historical: Opinions II
When making decisions in the real
world, the situation can easily become even far more biased than our
natural tendencies shown in those artificial studies. Not only we
tend to keep our initial opinions much longer than we should, we also
directly decide the sources of information we will use. And that
almost always means looking for opinions of those we already agree
with, while disregarding people who opposes our own views. Of course,
this will simply make us more sure about what we thought, even when
that should not be the case. While doing that, we just learn the
reasons why our opinion might be right, but we rarely come to know
the reasons why it might actually be wrong. Test yourself: Can you
make a convincing argument about some political or religious idea you
oppose? You don't have to believe the argument is enough to change
your mind, but it should be considered a solid argument ("It is the mark of an educated mind to be able to
entertain a thought without accepting it'', attributed to Aristotle).
Of course, anyone would like to think
that their beliefs are reasonable, rational, and well justified.
After all, if they weren't, we wouldn't have them, right? But
evidence, unfortunately, is not on our side. In a very interesting
example, Jervis observed an effect he called
irrational consistency (Baron uses the term belief
overkill). This consists of the fact that when people hold a
specific belief, for example in a policy, they usually have many
independent ideas they believe in and all of them support the said
policy. And those who oppose the policy tend to defend the opposite
set of ideas. However, if those ideas are independent, any rational
being could defend some and oppose others, while a consideration
about the total effect would lead to the final point of view on the
policy. That people are too consistent is a clear sign reason is not
playing the role it should in this problem.
Jervis mentions as an example the case
of people who supported or opposed a ban on nuclear tests. Among the
issues behind a decision to support or ban, he presents three issues:
if the tests would cause serious medical danger; if the tests would
lead to major weapon improvements; and if they would be a source of
international tension. It is important to notice that it is
completely reasonable to believe that the tests would not cause
serious medical danger but would cause international tension. These
evaluations are independent and any of the four possible combinations
of beliefs make just as sense as the other three. That means that, if
people were reasoning in a competent and independent way, no
correlation between those beliefs should be observed. And yet those
who were in favor of the ban held all the beliefs that the tests
would cause healthy problems, would lead to more dangerous weapons,
and would increase international tension. And, as it should be
obvious by now, those who opposed the ban, disagreed in all the
subjects with those who were in favor. Apparently, people felt
somehow led to have a consistent set of beliefs, even when there was
no reason at all for that consistency.
As a matter of fact, when our beliefs
seem to conflict with each other, a phenomenon called cognitive
dissonance, we have a tendency to change some of those beliefs to
avoid the conflict. This was observed in a series of experiments
conducted by Festinger . The typical experiment
included performing some task and be paid either a very small amount
for it ($1.00) or a more reasonable amount ($20.00, in 1962). When
the subjects were asked about their feelings about the task, those
who had been paid very little had a better evaluation of it than
those who had received more. The explanation proposed by Festinger is
that people wouldn't perform that task for just one dollar. But they
had done it, what created the cognitive dissonance that the subjects
solved by evaluating the task as more entertaining. After all, doing
an entertaining task for basically no money makes more sense than
doing a boring task.
Friday, January 17, 2014
Human Stupidity: Historical: Opinions I
Before moving on to other issues, there are still a couple of other examples of our probabilistic thinking that I'd like to discuss. As we have seen in the AIDS problem, the correct way to solve whether the patient was sick or not was first to consider the initial probability for the disease and then change it to a new, posteriori value as we learn the result of the exam. This method is the basis of the Bayesian view in Statistics and there is actual evidence that we reason in a way that resembles Bayesian methods even as early as 12-month old .
But we do not do that in a perfect way,
of course. In the AIDS problem, we saw that people generally simple
disregard the initial, prior information contained in the initial
chance of 1 in 1,000. We just use the information about the exam, as
if we had no initial good guess about the chance of the patient being
sick. And, as a matter of fact, if we had started with equal chances,
50% instead of 0.1%, the final chance for the patient to be sick
would, indeed, in that case, be 98%. But we knew better and, by
ignoring the prior information, we could cause a lot of unnecessary
damage.
This effect is known as base rate
neglect. In 1973, Kahneman and Tversky presented a problem to several people where they would have to guess,
based on a text description, if the described person was a lawyer or
an engineer. It was clearly stated that the described person was part
of a group with 30 engineers and 70 lawyers. This should mean that it
was more likely the person would be a lawyer than an engineer.
However, this piece of information was completely disregarded and
only the text used to make that inference. When the text was
non-informative, with no clues pointing to engineer or lawyer, people
would state there was a 50-50% chance, instead of the correct
30-70%.
At this point, it should be expected
this is not the only mistake we do on how we change our opinions. As
a matter of fact, the base rate neglect is not exactly an effect on
how we change opinions. It actually happens in subjects that we have
no initial opinions about. In those cases, even if there is evidence
to be used as an initial opinion, that initial evidence is disregarded
and only the new information is used. In many cases, however, people
have an opinion before new information is provided. In this case,
they should use that information as prior and update it following
Bayes theorem.
While this is a qualitatively correct
description of what we seem to do, it is not exact when we try to see
it in numeric terms. Phillips and Edwards
observed that, while people do change their opinions in the correct
direction, making a statement more or less probable when given data,
the amount of the change is smaller than it would be expected from a
simple use of the Bayes theorem. They have named this bias as
conservatism, as people tend to conserve their initial opinions more
than they should. And it is worth mentioning that they observed this
in a completely neutral situation, where people were asked to
evaluate from which bag a set of red and blue chips had come. They
informed the subjects that there were two possible bags, one with 700
red chips and 300 blue ones, while the other bag had 300 red chips
and 700 blue ones. If, after taking 12 chips from one of the bags, it
was observed that 8 were red and 4 were blue, the question is how
likely it is that those chips came from the bag with a majority of
red chips. You can ask yourself, as a reader, what probability value
would you state. Phillip and Edwards observed people tended to answer
a value around 70%. However, the correct value, if you assume both
bags were just as likely initially is a much larger change from the
initial 50%, the correct final probability is actually approximately
97%.
While this tendency to change opinions
too little might look at first a simple effect of analysing an
unknown problem, that is not the case. Even in the world of corporate
finance, evidence was observed that investors tend to under-react to information. Baron has an
interesting section on this problem, that he calls the persistence of
irrational beliefs where he cites some of the literature in the area.
This includes studies that show that the order the data is presented
affects the final opinion of individuals, even when that order was
irrelevant and contained no new information.
One interesting study on this primacy
effect, where first observations carry more weight than later ones,
even when they should not, was conducted by Peterson and DuCharme. Like the Phillips and Edwards study, they
had the question was to find out from which bag a set of poker was
more likely to have been drawn for. Urn C had 3 red, 2 blue, 2
yellow, 1 green and 2 white chips, while urn D contained 2 red, 3
blue, 1 yellow, 2 green and 2 white ships. One urn was shown to the
subjects and chips were taken from that urn and return to it, one at
a time. After each draw, the subjects were asked to evaluate the
probability that the urn they were drawing from was urn C. However,
the draws were not random but arranged so that the first 30 draws
favored the idea that was urn C, while the 30 following draws favored
D, in an exact mirror way to the first 30. That is, the total
evidence in favor of each urn canceled after the 60 draws and the
final opinion should be equal chances for each urn. But, since the
individuals started believing C was more probable, they observed a
very clear tendency to keep that initial evaluation. It typically
took series of 50 draws favoring D in order to counter the initial 30
draws supporting C.
Saturday, January 11, 2014
Human Stupidity: Historical: Probability Thinking IV
One particular troublesome example of how disastrous probabilistic mistakes can be and one I use every time for my students is the classical example of testing for the existence of a rare but serious disease. Most of the time, texts refer to the disease as AIDS, but which one is not relevant. Let's just assume that there is a treatment to the disease that has serious side effects and any physician would prefer not to administer it unless really necessary. Luckily, most of the population has not contracted the disease; we actually know only one person in 1,000 has it. We also have a test for it that is reasonably reliable. Whenever someone is sick, it provides a positive result for detecting the virus 98% of the time. Whenever someone does not have the virus it gives a negative result 98% of the times as well (the two 98% chances are not necessarily equal and are often not). In other words, in both cases, it gives an erroneous result only 2% of the times. Assume the test is applied at a person you have no other information about and the result comes positive. This suggests that person might have the disease, but it is also possible that the test has failed. Given all the information above, if you were evaluating this patient, how likely would you say it is that this person is actually sick?
The importance of getting the result correctly can not be overstated here. If it is very likely that the person is sick, treatment should start immediately. If it is very unlikely, it might make sense to prepare additional tests, but, since the treatment has serious side effects, it should not be applied. And, if it is the case that we are not sure at all, for more central probabilities, close to 50%, an assessment of the risk involved in each decision must be made with the proper caution. But that all depends on getting the right evaluation. Ask yourself, just by reading the problem, how likely you think it is that the patient is sick. If you are like the majority of humankind, you will reason that, since the test is correct 98% of time, the probability that the patient would be sick should be around 98% as well. So, you would start the treatment immediately.
But the truth is not so simple. That reasoning simply ignores one extremely important information that you had. And that is the initial chance the patient were sick. I did tell you that was, before the test, 1 in 1,000. Reason a little. If it were the opposite, with 99,9% of the population sick, a positive result should mean extra evidence in favor of the disease and the chance should be even larger than 99,9%, not as small as 98%. By the same reasoning, if you knew for sure, at first, this patient was healthy, you would continue to know it and simply conclude this was one of the 2% cases where the test goes wrong. So, that 1 in 1,000 has important information and you ignored it. And, as a matter of fact, the chance that this patient is sick is not 98%, it is actually smaller than 5%. If the side effects of the treatment are severe, a person that was had a 95% chance of being healthy would have to suffer it, without enough evidence to support the need for it.
What is going on? It is actually not so hard to understand when we look at the whole picture. What we know is that the test gave a positive result. Two things might have happened to cause that result. The patient could actually be sick and that would have happened initially with one chance in 1,000. Or the test could have failed, providing a positive answer for a healthy person. This case had a chance of 20 in 1,000. Clearly, the failure of the test is much more likely than the hypothesis that the patient is sick, 20 times more likely, actually. The chance of the patient being sick actually increases from the initial 1 in 1,000 to a posterior probability a little less than 50 in 1,000. That is a huge increase and this large change is due to the fact the test is reasonably reliable. But, by ignoring one very important piece of information in the problem, completely wrong decisions about our health can happen (and almost certainly do happen) every day (the correct way to calculate the final probability is using Bayes Theorem. Its use will be explained later here).
As a matter of fact, the simple misuse of basic probabilistic and statistical concepts in health problems is so widespread and so serious that efforts already exist to better educate physicians and patients. Gerd Gigerenzer has been championing this urgent need for better education, with good reason. He has also collected a number of stories on how that misuse can cause bad health decisions and bad policies, leading governments to spend money on non-existant problems. One of those stories I particularly like I heard from him in a conference. The problem started when Rudi Giuliani said in a campaign advertisement in 2007 that: ``I had prostate cancer, 5, 6 years ago. My chance of surviving prostate cancer -- and thank God I was cured of it -- in the United States? 82%. My chance of surviving prostate cancer in England? Only 44% under socialized medicine''(original story in the Washington Post is here).
This certainly looks as if there were a very serious difference in the quality of the treatments in USA and England. But the actual mortality rates in both countries are basically the same. And yet, Giuliani was not lying, just using numbers without all the competence required to analyse them. Those figures he cited were 5-year survival rates. What was actually happening was that in the USA men are under a lot of pressure to screen for prostate cancer and many actually do participate in the prostate-specific antigen (PSA) screening. That does not happen in England. Even forgetting the important and not clear question of whether screening actually saves lives or not, there is one other extremely important effect here. The percentages do not refer to the groups treated equally and, as such, they just are NOT comparable. Gigerenzer has the perfect example to explain why.
Imagine two men, one American and one English, who will die of prostate cancer at age 70, regardless of treatment. In both cases, the cancer could be detected by screening as early as they were 60. But only the American does the screening and he discovers the disease at age 60. Even if the treatment fails and he dies at 70, his 5-year survival rate is still 100%, he does get to 65 without dying. The English man, on the other hand, only finds out he is sick when the disease is advanced and there are clear symptoms, at age 67. He also dies at the same age, but he does not pass the 5-year survival period. Not because of any difference in the health systems, both men contracted the disease at the same age and died the same age as well. The only difference is that one knew it much earlier. And, since American men are screened, they do know about it earlier and the almost double death rate in England is nothing more than a trick of measuring things differently.
The importance of getting the result correctly can not be overstated here. If it is very likely that the person is sick, treatment should start immediately. If it is very unlikely, it might make sense to prepare additional tests, but, since the treatment has serious side effects, it should not be applied. And, if it is the case that we are not sure at all, for more central probabilities, close to 50%, an assessment of the risk involved in each decision must be made with the proper caution. But that all depends on getting the right evaluation. Ask yourself, just by reading the problem, how likely you think it is that the patient is sick. If you are like the majority of humankind, you will reason that, since the test is correct 98% of time, the probability that the patient would be sick should be around 98% as well. So, you would start the treatment immediately.
But the truth is not so simple. That reasoning simply ignores one extremely important information that you had. And that is the initial chance the patient were sick. I did tell you that was, before the test, 1 in 1,000. Reason a little. If it were the opposite, with 99,9% of the population sick, a positive result should mean extra evidence in favor of the disease and the chance should be even larger than 99,9%, not as small as 98%. By the same reasoning, if you knew for sure, at first, this patient was healthy, you would continue to know it and simply conclude this was one of the 2% cases where the test goes wrong. So, that 1 in 1,000 has important information and you ignored it. And, as a matter of fact, the chance that this patient is sick is not 98%, it is actually smaller than 5%. If the side effects of the treatment are severe, a person that was had a 95% chance of being healthy would have to suffer it, without enough evidence to support the need for it.
What is going on? It is actually not so hard to understand when we look at the whole picture. What we know is that the test gave a positive result. Two things might have happened to cause that result. The patient could actually be sick and that would have happened initially with one chance in 1,000. Or the test could have failed, providing a positive answer for a healthy person. This case had a chance of 20 in 1,000. Clearly, the failure of the test is much more likely than the hypothesis that the patient is sick, 20 times more likely, actually. The chance of the patient being sick actually increases from the initial 1 in 1,000 to a posterior probability a little less than 50 in 1,000. That is a huge increase and this large change is due to the fact the test is reasonably reliable. But, by ignoring one very important piece of information in the problem, completely wrong decisions about our health can happen (and almost certainly do happen) every day (the correct way to calculate the final probability is using Bayes Theorem. Its use will be explained later here).
As a matter of fact, the simple misuse of basic probabilistic and statistical concepts in health problems is so widespread and so serious that efforts already exist to better educate physicians and patients. Gerd Gigerenzer has been championing this urgent need for better education, with good reason. He has also collected a number of stories on how that misuse can cause bad health decisions and bad policies, leading governments to spend money on non-existant problems. One of those stories I particularly like I heard from him in a conference. The problem started when Rudi Giuliani said in a campaign advertisement in 2007 that: ``I had prostate cancer, 5, 6 years ago. My chance of surviving prostate cancer -- and thank God I was cured of it -- in the United States? 82%. My chance of surviving prostate cancer in England? Only 44% under socialized medicine''(original story in the Washington Post is here).
This certainly looks as if there were a very serious difference in the quality of the treatments in USA and England. But the actual mortality rates in both countries are basically the same. And yet, Giuliani was not lying, just using numbers without all the competence required to analyse them. Those figures he cited were 5-year survival rates. What was actually happening was that in the USA men are under a lot of pressure to screen for prostate cancer and many actually do participate in the prostate-specific antigen (PSA) screening. That does not happen in England. Even forgetting the important and not clear question of whether screening actually saves lives or not, there is one other extremely important effect here. The percentages do not refer to the groups treated equally and, as such, they just are NOT comparable. Gigerenzer has the perfect example to explain why.
Imagine two men, one American and one English, who will die of prostate cancer at age 70, regardless of treatment. In both cases, the cancer could be detected by screening as early as they were 60. But only the American does the screening and he discovers the disease at age 60. Even if the treatment fails and he dies at 70, his 5-year survival rate is still 100%, he does get to 65 without dying. The English man, on the other hand, only finds out he is sick when the disease is advanced and there are clear symptoms, at age 67. He also dies at the same age, but he does not pass the 5-year survival period. Not because of any difference in the health systems, both men contracted the disease at the same age and died the same age as well. The only difference is that one knew it much earlier. And, since American men are screened, they do know about it earlier and the almost double death rate in England is nothing more than a trick of measuring things differently.
Subscribe to:
Posts (Atom)