#### Department

Statistics Department

#### First Advisor

Dr. Steve Bieber

#### Description

Discriminant analysis was developed by R. A. Fisher in 1936, where he classified three varieties of iris plants based on physical features. To create his classification table, the posterior probabilities of each outcome are calculated and are somewhere between zero and one. These decimals are rounded to the nearest whole number. A confidence table does not do this rounding, but maintains the decimals to calculate how likely it is for an event to happen. We created random data to test whether the confidence table is more accurate than the classification table. We used groups of two, three, four, and five and found discrepancy of about ten to fifteen percent between the tables. Rounding the probabilities gave the illusion of having more accurate results. To further test our hypothesis, we used real data from a political survey. Over a thousand people were surveyed on thirty issues. The average bias was about eleven percent between the tables, with the confidence table being more accurate. Though there was some bias in the confidence table, it was likely the result of computer algorithms finding patterns that do not exist. In general, it was much more reliable than the classification table for predicting outcomes.

#### Included in

The Confidence Table as an Improvement in Classification

Discriminant analysis was developed by R. A. Fisher in 1936, where he classified three varieties of iris plants based on physical features. To create his classification table, the posterior probabilities of each outcome are calculated and are somewhere between zero and one. These decimals are rounded to the nearest whole number. A confidence table does not do this rounding, but maintains the decimals to calculate how likely it is for an event to happen. We created random data to test whether the confidence table is more accurate than the classification table. We used groups of two, three, four, and five and found discrepancy of about ten to fifteen percent between the tables. Rounding the probabilities gave the illusion of having more accurate results. To further test our hypothesis, we used real data from a political survey. Over a thousand people were surveyed on thirty issues. The average bias was about eleven percent between the tables, with the confidence table being more accurate. Though there was some bias in the confidence table, it was likely the result of computer algorithms finding patterns that do not exist. In general, it was much more reliable than the classification table for predicting outcomes.

## Comments

Oral Presentation, Wyoming NSF EPSCoR: WySTEP