What If Your Organization Gets Sued for Employment Testing? A Case Study of an EEO Employment Test Validation Suit: Smith vs. City of Boston - Part 1
What If Your Organization Gets Sued for Employment Testing? A Case Study of an EEO Employment Test Validation Suit: Smith vs. City of Boston - Part 1

Each year, the Equal Employment Opportunity Commission (EEOC), the U.S. Department of Justice, the Office of Federal Contract Compliance Programs (OFCCP), or plaintiffs represented by private counsel, litigate cases under Title VII of the Civil Rights Act of 1964. Many of the cases involve practices, procedures, or tests (PPT) used for hiring and promotion. Organizations stand to lose much, not only financially, but also legally, and in harm to their reputations. The litigation could require giving back pay and interest to the group discriminated against, or having the court begin to carefully monitor the employer’s practices, procedures, or tests used in hiring or promotion. These costs and intrusions don’t include lost time or costs incurred by the organization if they have developed the practice, procedure, or test themselves or hired an outside firm to do so. Also, if applicants had to devote a significant amount of time to study in preparation for a test, and the results of the test were thrown out because the test was not legally defensible, the organization has wasted time. This article explains a recent disparate impact and test validation suit, the litigation process, and things an organization should consider in developing practices, procedures, or tests for hiring and promotional purposes.

The Recent Disparate Impact and Test Validity Suit

On November 16, 2015, a court ruling was filed for a case involving 10 African American police sergeants employed by the Boston Police Department (BPD) who sued the City of Boston under Title VII of the Civil Rights Act of 1964 (Smith v. City of Boston, 2015) 1. The plaintiffs alleged that the multiple-choice exam BPD administered in 2008 to select sergeants to promote to the rank of lieutenant had disparate impact on minority candidates and was not related enough to the position to survive a challenge. The City claimed the exam had no disparate impact on minority candidates, and that even if it did, the exam was sufficiently valid to survive a Title VII challenge. The exam in this case consisted of two elements: a written, closed-book exam consisting of 100 multiple-choice questions, and an Education and Experience (E&E) rating.

Nature of This Disparate Impact Case

The goal of Title VII is “that the workplace be an environment free of discrimination, where race is not a barrier to opportunity” (Ricci v. DeStefano, 2009) 2. The statute is designed to “promote hiring on the basis of job qualifications, rather than on the basis of race or color” (Griggs v. Duke Power, 1971) 3. There are two types of employment discrimination under Title VII: disparate treatment and disparate impact. A disparate treatment claim accuses an employer of intentionally making employment decisions based on non-job-related reasons such as race. A disparate impact claim challenges an employment decision that appears not to have an intent of discrimination, but still results in less favorable outcomes for those in a protected class.

The case examined here is a disparate impact case. Hiring and promotional disparate impact cases follow guidelines in the federal Uniform Guidelines on Employee Selection Procedures (Uniform Guidelines, 1978) 4. The Uniform Guidelines was developed in 1978 to set a standard for the development and validation of PPTs and guide the courts in how to try cases with disparate impact in selection, promotion, or other employment decisions. The Uniform Guidelines defines disparate impact as “A substantially different rate of selection in hiring, promotion, or other employment decision which works to the disadvantage of members of a race, sex, or ethnic group.” If disparate impact has occurred, the next step is for the court to determine whether the PPT is job-related. Valid PPTs may have disparate impacts on a protected group such as women or minorities and may still not be illegally discriminatory.

Under the Uniform Guidelines, an unlawful employment practice based on disparate impact is established only if the plaintiff (typically an applicant who wasn’t hired) demonstrates that the defendant (typically the employer) used an employment practice based on race, color, religion, sex, or national origin that caused a disparate impact on the plaintiff, and the defendant failed to demonstrate that the challenged practice, procedure, or test was related to the job the plaintiff was applying for. On the other hand, if the plaintiff suggests an alternative employment practice that benefits the employer and the defendant equally and is not a disparate treatment, but the defendant refuses to adopt the alternative employment practice, the plaintiff will prevail. The plaintiff must present initial evidence to prove a case of disparate impact discrimination by first identifying a particular employment practice. In this case, the particular employment practice was the 2008 promotional test. If the plaintiff can establish an initial case of discrimination, the defendant (the employer in this case) may either discredit the plaintiff’s initial case of disparate discrimination, or the defendant may show that even if there was disparate impact discrimination, the employment practice is “job-related and consistent with business necessity.” However, the plaintiff can still win the case if he or she can show that another employment practice was available that would have served the interest of the employer equally well and had less disparate impact.

Court Ruling on Disparate Impact

Since the plaintiffs alleged that the multiple-choice exam BPD administered in 2008 to select sergeants to promote to the rank of lieutenant had disparate impact on minority candidates, and was not related enough to the position to survive a challenge, the first determination in this case that needed to be made was whether the 2008 exam exhibited disparate impact against the minority applicants.

How a Finding of a Disparate Impact is Normally Determined

Often, as is the case in this situation, the legal proceedings rely heavily on expert witness testimony. The defendant’s expert witness tried to provide statistical evidence that disparate impact didn’t exist and that the test was valid, whereas the plaintiff’s expert witness argued that there was sufficient statistical evidence of disparate impact and the test wasn’t valid. Experts from the plaintiff and defendant side reviewed reports about the test development, scores, and promotion rates when the 2008 exam was used, and provided their analysis of the findings and expert testimony. Both parties agreed that minority test-takers passed the 2008 exam and were promoted to lieutenant at a lower rate when compared to white candidates. However, a lower rate of promotion in and of itself is not enough to result in a ruling of disparate impact. There must be a large enough statistical difference in the promotion rates between the two groups to demonstrate that the differences weren’t occurring just by chance.

Two types of measurements can be used to demonstrate disparate impact: tests of statistical significance and the 4/5ths rule. Statistical significance in cases such as this can refer to evidence that the chances of the minority candidates scoring lower on the selection procedure by chance alone was less than five percent. This is shown through a statistical test demonstrating that the p-value (probability value) is less than .05, meaning that the probability of the score occurring by chance is less than five in 100. The 4/5ths rule stipulates that if the selection rate for any racial group is less than 4/5ths (80 percent) of the selection rate of the group with the highest rate, there is evidence of disparate impact. The 4/5ths rule is often considered a measure, but not the only measure of practical significance. Practical significance goes beyond statistical significance by looking at whether the magnitude of the differences in selection rates between minority and majority group members is meaningful.

The following is an example of the 4/5ths rule calculation. If there are 100 white applicants and 100 African American applicants and 60 white applicants are offered and accept the job and 40 African American applicants are offered and accept, the selection ratio for whites is 60 percent (60/100), and the selection ratio for African Americans is 40 percent. When the 40 percent selection ratio for African Americans is divided by the 60 percent selection ration for whites, the result is 66.66 percent, or 2/3rds. Therefore, since the selection ratio for African Americans is less than 4/5ths of the selection ratio for whites, the ratio of whites hired to African Americans hired is in violation of the 4/5ths rule.
The likelihood of finding statistically significant differences increases as the number of people in disparate impact litigation increases.
Other factors can affect the way practical significance and statistical significance measures are evaluated by the courts in determining disparate impact. The likelihood of finding statistically significant differences increases as the number of people in disparate impact litigation increases. Conversely, it becomes more difficult to find statistically significant differences between groups as the sample size decreases. Therefore, when the sample size is very large or very small, practical significance measures become more important. The Uniform Guidelines has indicated use of practical significance as a “rule of thumb,” but doesn’t provide clear guidance on how to weigh practical significance with statistical significance measures when making a ruling of disparate impact. Each court case examines statistical significance and practical significance measures when deciding on a ruling of disparate impact.

As an example, assume a situation in which African Americans were being hired at a lower rate than whites, but altogether, there weren’t many applicants being hired. A defense attorney might try to demonstrate that if only two more African Americans were hired, differences in hiring rates would become statistically non-significant. The court may then rule that the differences were not practically significant and that disparate impact did not occur. This would be an example of the differences in selection ratios being so small that disparate impact may not even occur if the same selection process was administered again.

A research study conducted by Clavette (2010) 5 can assist with understanding how frequently practical significance and statistical significance measures are used by courts in determining a ruling of disparate impact. Clavette examined 29 disparate impact court cases from 1974 to 2009 to assess the frequency with which practical and statistical significance measures were used to determine disparities. Of the 29 cases, 11 included only a practical significance measure to detect disparate impact. Of these, six (or about 55 percent) were successful in demonstrating a disparity. Three cases used only a statistical significance test as a measure of disparity, and of these, two (or about 66 percent) successfully indicated a disparity. The remaining 15 cases used both a practical measure and a statistical measure. Eleven (or about 73 percent) of these cases demonstrated significant disparities. There is no requirement for having both practical measures and statistical measures. However, it does seem that when both statistical and practical measures are used, more disparities are found.

Finding of Disparate Impact in the Administration of the BPD Examination

These principles for determining whether there is disparate impact were used to evaluate whether there was a disparate impact in the BPD examination case. The court evaluated four metrics (promotion rates, passing exam scores, average exam scores, and delay in promotions) to determine whether disparate impact existed. Average exam scores and average delays in promotions could be evaluated using only statistical significance because the 4/5ths rule involves only selection practices in which there are only two outcomes (hired or not hired, promoted or not promoted). Passing exam scores violated the practical significance 4/5ths rule and were statistically significant. This meant that minority candidates scored lower on the exam and passed at a rate less than 80 percent of white candidates. Minority candidates also had statistically significant lower average exam scores (6.6 points lower) and average delays in promotions (750 additional days).

Promotion rates was the main metric of interest because what matters most at the end of the day is who was hired as a police lieutenant and who wasn’t. The p-value for the promotion rates was .052, just slightly above the .05 threshold, but it did violate the 4/5ths rule. While it was only slightly above the threshold, the court still found this to be sufficient evidence to make a determination of disparate impact for two reasons:
  • While the .05 cut-off for finding statistical significance is generally accepted, using the .05 cut-off is not a rule in establishing disparate impact because it is not a measure of practical significance.

  • Since the Uniform Guidelines mentions only “A substantially different rate of selection” between two groups, the conclusion of disparate impact is at the discretion of the courts. The Uniform Guidelines allows the courts to consider the surrounding facts and circumstances of the case in their overall evaluations of disparate impact. In the BPD case, the court chose to rule in favor of the plaintiffs that disparate impact had occurred, because three out of the four metrics (average exam scores, average delays in promotions, and passing exam scores) showed statistically significant differences between minorities and whites. In addition, the number of minorities that passed the exam compared to whites violated the 4/5ths rule.
Court Ruling
  • The court ruled that disparate impact had occurred in the administration of the BPD examination.
In Part 2 of this article, we will examine the court’s evaluation of the validity of the test. Look for this article in the May edition of The OFCCP Digest.
  1. Smith v. City of Boston, 12-10291. 1st Cir. (2015).

  2. Ricci v. DeStefano, 557 U.S. 557, 580 (2009).

  3. Griggs v. Duke Power Co., 401 U.S. 424, 434 (1971).

  4. Uniform Guidelines – Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, and Department of Justice (August 25, 1978), Adoption of Four Agencies of Uniform Guidelines on Employee Selection Procedures, 43 Federal Register, 38,290-38,315.

  5. Clavette, M. DCI Consulting. (2010). A Review of Adverse Impact Measurement in Case Law. Washington, DC.