Gender Discrimination: A Statistical Analysis Gender discrimination, or sex discrimination, may be characterized as the unequal treatment of a person based solely on that person’s sex. . It is apparent that gender discrimination is pervasive in the modern workplace, however, its presence and effects are often misrepresented and misunderstood.
Statistical testing plays an important role in cases where the existence of discrimination is a disputed issue and has been used extensively to compare expected numbers of members of a protected group, to the actual number of members of that protected group that have been involved in a significant employment action. This paper will use statistical testing and analysis, including a multiple regression model, to estimate the effects that various independent variables have upon the dependent variable, salary level. This analysis utilized a data sample consisting of 46 employees and variables relating to each of those employees.
These variables include: gender, age, level of education, length of employment, job type, and weekly salary. Each of these variables is further broken as follows: gender was divided between males and females; age was listed as the age of the employee; education was broken down to reflect the last level of education obtained by the employee, some high school, high school, college, and graduate school; employment length was valued as the number of months the employee had been employed; job type reflected different positions, clerical, technical and managerial; and weekly salary reflected the weekly salary of each employee in the sample.
In order to make inferences about the sample data, SPSS was used to generate a multiple regression model. The purpose of this multiple regression model is to predict a dependent variable based on the values of multiple independent variables. In this case, the initial multiple regression model was produced using weekly salary as the dependent variable. After an initial review of the results, it became apparent that one of the employees was an outlier with respect to the sample. The mean of the salary for the sample was $373. 83 with a standard deviation from the mean of $91. 74.
In this case, employee ID#3’s salary of $798 deviated from the mean by 4. 62 SD’s. This level of deviation would place this employee well within the highest 1% of the population and would skew the date to the right of the mean. It can therefore be concluded that this employee was an outlier and that removal of this employee from the sample set would produce more accurate data with regard to each of the independent variables. After removal of the outlier, the sample mean was reduced to $364. 40. It is possible for a presumption of discrimination to be inferred from a comparison of average salaries in a sample.
In this case, the data reflected average salaries, of $364. 40, $380. 46 and $285. 14, for the whole group, men and women, respectively. Typically, courts have required that statistical evidence exhibit a difference of more than two or three standard deviations between the expected incidence (in this case the average salary of the sample, $364. 40) and the actual incidence (the average salary for women in the sample, $285. 14) to prove discrimination on its face. In this case, the average woman’s salary deviated by less than one standard deviation from the mean and therefore the data does not infer discrimination on its face.
The use of a multiple regression model is necessary to estimate the effects that the independent variables had on the dependent variable, salary. To determine the effect each independent variable had on salary, SPSS was used to generate descriptive statistics, including, but not limited to: frequencies, means, standard deviations, correlations, and coefficients. To assess whether any of the independent variables affected the dependent variable, we also performed independent t tests. For this data set, the t value (at 45 degrees of freedom) is between 2. 021 and 2. 000.
Therefore, if t values for the independent variables exceed 2. 021, it is suggested that we reject the null hypothesis that there is not a relation between these variables and salary. In addition to t values, the R-Square and F values are additional ways to measure the overall predictive accuracy of a multiple regression model. R Squared is the proportion of variance in the dependent variable (salary), which can be predicted from the independent variable. F Values and their associated P Values (Sig) are used to determine whether the independent variables reliably predict the dependent variable.
For purposes of this analysis, we will assert that P Values great than 0. 05 imply that the independent variable does not show a significant relationship to with the dependent variable, or that that the independent variable does not reliably predict the dependent variable. The initial regression model included all of the independent variables. This model produced the following descriptive statistics: R2 of 0. 664, SE of 56. 47, F of 15. 422 and Sig of 0. 00. This indicates that 66. 4% of the variance in salary can be predicted from the independent variables.
In that the P Value was less than 0. 05, we can assume that the independent variables included in the model show a significant relationship with salary. With regard to each of the variables, both gender and age had P Values greater than 0. 05 and t values greater than 2. 021. These statistics necessitate additional regression models that control for and isolate these variables. The next regression model included all of the independent variables except for gender. This model produced the following descriptive statistics: R2 of 0. 63, SE of 55. 84, F of 19. 60 and Sig of 0. 00. Though there could be an argument made that gender could be removed as a variable to produce more predictive statistics in this case, omitting important explanatory variables make statistical analysis unreliable. While a regression is not fatally flawed by the omission of a variable, the omission of explanatory variables that are correlated with included explanatory variables can cause the regression coefficients and their test and error rate calculations to lose their predictive properties.
Ss such, it would not be reasonable to remove gender from this analysis. The next regression model included all of the independent variables except for age. This model produced the following descriptive statistics: R2 of 0. 653, SE of 56. 71, F of 18. 78 and Sig of 0. 00. As above, the level of statistical variation from the previous model with regard to R2 and SE are not significant enough to omit this variable. 6. Your conclusions based on statistical analysis(ses) and recommendations. ——————————————- [ 1 ]. http://law. jrank. org/pages/12485/Gender-Discrimination. html [ 2 ]. Hazelwood School District v. U. S. 33 U. S. 299 (1977) [ 3 ]. Group and men averages were adjusted to reflect removal of outlier. [ 4 ]. The Increasing Sophistication of Statistical Assessments as Evidence in Discrimination Litigation, 77 Am. Stat. A. J. 784 (1982) [ 5 ]. Sobel v. Yeshiva University, 839 F. 2d 18 (2nd Cir. , 1988)