Introduction

Despite the substantial benefits of participation in sports and Title IX protections against sex discrimination in athletics, the playing field is still not level for girls. Girls are twice as likely to be inactive as boys, and girls have nearly 20% fewer opportunities to participate in both high school and college sports than boys. Improved enforcement of Title IX and diligent efforts to advance women and girls in sports are still necessary to achieve truly equal opportunity on the playing fields. National Coalition for Women and Girls in Education “Title IX at 35: Behind the Headlines” 2008

In 1972, Congress enacted Title IX prohibiting discrimination on the basis of sex “under any education program or activity receiving Federal financial assistance…” 20 U.S.C.A. § 1681. Two years later, Congress enacted the Javits Amendment directing the Secretary of Health, Education, and Welfare (“HEW”) to propose regulations applying Title IX to intercollegiate athletics receiving federal funds. Education Amendments of 1974, Pub. L. No. 93–380, § 844. HEW adopted regulations requiring schools with athletics “provide equal athletic opportunity for members of both sexes”, based on ten factors, one of which is “[w]hether the selection of sports and levels of competition effectively accommodate the interests and abilities of members of both sexes”.45 C.F.R. § 86.41.

In 1979, HEW issued a policy interpretation of this “interests and abilities” factor. 44 Fed.Reg. 71,413, 71,418 (Dec. 11, 1979). This factor required a three-part test that:

  1. “participation opportunities for male and female students are provided in numbers substantially proportionate to their respective enrollments”
  2. “the institution can show a history and continuing practice of program expansion which is demonstrably responsive to the developing interests and abilities of the members of that sex”, or
  3. “interests and abilities of the members of that sex have been fully and effectively accommodated by the present program” Id.

In 1996, the Office of Civil Rights (replacing HEW) clarified that a school need only meet one of the three parts to demonstrate compliance, and, after stressing that compliance is determined on a case-by-case basis, issued a plethora of factors to guide schools in each part. OCR, “Clarification of Intercollegiate Athletics Policy Guidance: The Three-Part Test” Jan 16, 1996. However, in the end, courts have consistently tied this “interest” to actual participation rather than an unfulfilled opportunity. Equity In Athletics, Inc. v. Department of Educ. 639 F.3d 91 (4th Cir. 2011). See Also Neal v. Bd. of Trs. of Cal. State Univ., 198 F.3d 763, 767 (9th Cir. 1999), Boulahanis v. Bd. of Regents, 198 F.3d 633, 638–39 (7th Cir. 1999), Cohen v. Brown Univ., 101 F.3d 155, 174 (1st Cir. 1996).

Issue

The following three questions are analyzed over seven statistical models.

  1. How much inequality is there in female athletic participation, controlling for school type, classification, and time? (Models 1-2)
  2. To what degree does this inequality change over time, controlling for school type, classification, and time? (Models 3-4)
  3. How much inequality is there in female athletic participation, controlling for time and inequality in financial aid, coaching, recruiting, and revenue? (Models 5-7)

Data

The Dataset is complied from data obtained from Office of Postsecondary Education of the U.S. Department of Education, available online here. This data was compiled from self-reported institutional data from over 2000 collegiate institutions which accept federal funds for the years 2005-2011. For years 2005-2007, the data omitted school type, so that was compiled that from either future years or information found at 50states.com. To keep large sample sizes, I removed variables about individual sports and coed sports, leaving 28 explanatory variables for each school. These variables, times the number of schools, across 7 years, resulted in over 300,000 data points from which to analyze.

Table 1: Original Data

Variable NameVariable Description
schoolidSchool Unique ID
YearYear
StateUnique ID for State, alphabetically beginning with A
TypeType of Institution (Public, Nonprofit, 4 year)
ClassIntercollegiate Athletic Organization
MalesNumber of Males in overall student body
FemalesNumber of Females in overall student body
StAid_MStudent Aid for Male Athletes
StAid_FStudent Aid for Female Athletes
RecExp_MRecruitment Expenses for Male Athletics
RecExp_FRecruitment Expenses for Female Athletics
Hsal_MAverage Head Coach Salary for Male Athletics
Hsal_FAverage Head Coach Salary for Female Athletics
Hnum_MNumber of Head Coaches for Male Athletics
Hnum_FNumber of Head Coaches for Female Athletics
Asal_MAverage Assistant Coach Salary for Male Athletics
Asal_FAverage Assistant Coach Salary for Female Athletics
Anum_MNumber of Assistant Coaches for Male Athletics
Anum_FNumber of Assistant Coaches for Female Athletics
Part_MNumber of Participating Males Athletes
Part_FNumber of Participating Female Athletes
Rev_MRevenue for Male Athletics per team, excluding football+basketball
Rev_FRevenue for Female Athletics per team, excluding football+basketball
Exp_MTotal Expenses for Male Athletics per team, excluding football+basketball
Exp_FTotal Expenses for Female Athletics per team, excluding football+basketball

 

After importing the data into Gretl, I first found ‘z’, the baseline ratio of females in the total student body.

z = Females / (Males + Females)

Next, I converted non-categorical variables from an aggregate total to ‘B_’, the difference between the ratio of females in that athletic category and the baseline ratio of females in the total student body. This represents the amount of inequality in that variable.

B (x) = z – [xF / (xF + xM)]

For example, ‘y’ is the the difference of ratios between the number of female athletes and female students in general, which represents the amount of inequality in female athletic participation.

y = z – [FemaleAthletes / (FemalesAthletes + MaleAthletes)]

For the non-discrete categorical variables (class + type), I converted them to the following dummy variables.

Table 2: Dummy Variables for Classification

ClassDescription
1'NCAA Division I-A'
2'NCAA Division I-AA'
3'NCAA Division I-AAA'
4'NCAA Division II (with football)'
5'NCAA Division II (without football)'
6'NCAA Division III (with football)'
7'NCAA Division III (without football)'
8'Other'
9'NAIA Division I'
10'NAIA Division II'
11'NAIA Division III'
12'NJCAA Division I'
13'NJCAA Division II'
14'NJCAA Division III'
15'NCCAA Division I'
16'NCCAA Division II'

Table 3: Dummy Variables for Type

TypeDescription
1'Public, 4-year or above'
2'Private nonprofit, 4-year or above'
3'Private for-profit, 4-year or above'
4'Public, 2-year'
5'Private nonprofit, 2-year'
6'Private for-profit, 2-year'

 

For the second model, I then converted ‘B_’ to ‘d_’, the difference in inequality from last year to this year, which represents the change in inequality.

d (x) = B(x|t=0) -  B(x|t=-1)

A electronic copy of the Dataset can be downloaded here.

Method

I imported the data into Gretl to run an econometric regression. Rather than run separate equations to predict gender discrimination given each single variable, regressions are single equations to predict given multiple variables. While they entail a whole host of assumptions, they do slightly better than the blind guesses of simple statistics, simply because they can control for additional variables.

When determining strengths of a model, three statistical indicators are most important: the coefficient of dependent variable (‘B′), the p-value ( ‘p-value’), and goodness of fit (‘Adjusted R^2′).’B′ is the magnitude of the correlation between “Y” and “X1″. The ‘p-value’ is how accurate the prediction of ‘B′ is, or more specifically, the probability of obtaining a t-statistic at least as extreme assuming that the null hypothesis (X1=0) is true. A p-value less than .05 is commonly said to be “statistically significant,” although most statisticians strive for a p-value less than .01.  Finally, ‘R^2′ is how precise the prediction is, or more specifically, how well future outcomes are likely to be predicted by the model. Although a small R^2 is not automatically detrimental to the model (something can have a small, yet certain effect), a larger R^2 is better more often than not. ‘Adjusted R^2′ represents a statistical modification to adjust for biases produced by additional independent variables (‘X2+…+Xk’). In short, a good model has an independent variable estimator with a magnitude not equal to zero, a p-value less than .01, and an Adjusted R^2 better than alternative explanatory models.

For panel data, the standard basic model is a Pooled-OLS Regression, of which a slightly more complex form is a Heteroskedastic Pooled-OLS Regression. According to the relative form of the distribution of observations over time, more advanced models might be needed. Because dummy variables rarely change over time, Models 1-4 are Heteroskedastic Pooled OLS Regressions. Model 5 is also a Heteroskedastic Pooled OLS Regression in order to identify four strong legal variables, of which Model 6 adds 11 interaction terms based on those four. Finally, Model 7 incorporates the the four strong legal variables and their 11 interaction terms into a Heteroskedastic Fixed-Effects Regression.This produces less biased indicators of ‘B’ at the expensive of biasing R^2.

A detailed appendix on the methodology can be found here.

Models

Model 1 – Heteroskedastic Pooled-OLS Regression of Inequality: {Type, Time, constant}

This model explains the amount of inequality in female athletic participation, using dummy variables for school type (2-6) and year (2-7). Thus, the baseline model (all dummy variables being false) predicts the amount of inequality in female participation in 2005 for a 4 year public institution. Each dummy variable explains the effect on inequality for that variable relative to the baseline prediction.

Model 1

VariableB CoefficientStd Errort-ratiop-valueAdj. R^2
Jointly (F)1.90E-1320.044328
const0.1343350.0042881131.332.49e-208***
DType20.02459640.005053654.8671.15e-06***
DType30.0762420.03346972.2780.0227**
DType40.05582130.005792559.6376.53e-22***
DType50.03813910.02476011.540.1235
DType60.1554790.1013621.5340.1251
dt_2−0.01572200.00167611−9.3807.58e-21***
dt_3−0.01739760.00186146−9.3461.04e-20***
dt_4−0.02125200.00211088−10.079.22e-24***
dt_5−0.02197180.0021101−10.412.68e-25***
dt_6−0.02855240.00214869−13.294.71e-40***
dt_7−0.02804020.00216128−12.972.85e-38***

 

Model 2 – Heteroskedastic Pooled-OLS Regression of Inequality:{Class, Time, constant}

This model explains the amount of inequality in female athletic participation, using dummy variables for school classification (2-16) and year (2-7). Thus, the baseline model (all dummy variables being false) predicts the amount of inequality in female participation in 2005 for a NCAA Division I-A school. Each dummy variable explains the effect on inequality for that variable relative to the baseline prediction.

Model 2

VariableB CoefficientStd Errort-ratiop-valueAdj. R^2
Jointly (F)00.108886
const0.0836010.0061776813.531.82e-41***
DClass20.06169720.009494436.4988.40e-11***
DClass30.003061850.008616420.35540.7223
DClass40.1320180.0092325714.34.65e-46***
DClass50.05822320.009252886.2923.22e-10***
DClass60.09617290.0079788912.052.71e-33***
DClass70.04423030.008931174.9527.42e-07***
DClass80.09056940.0088185110.271.17e-24***
DClass90.0932810.01067918.7352.71e-18***
DClass100.0800220.009183488.7143.27e-18***
DClass110.1218450.03953383.0820.0021***
DClass120.1042860.0091372811.414.87e-30***
DClass130.09910510.01187228.3487.61e-17***
DClass140.1055690.0097815610.794.75e-27***
DClass150.061630.03307211.8640.0624*
Dclass160.01893210.01235391.5320.1254
dt_2−0.01552820.00166576−9.3221.31e-20***
dt_3−0.01678070.00185321−9.0551.54e-19***
dt_4−0.02066350.00210098−9.8359.40e-23***
dt_5−0.02144580.00210341−10.202.52e-24***
dt_6−0.02847920.00213759−13.322.98e-40***
dt_7−0.02793420.00214753−13.011.84e-38***

 

Model 3 – Heteroskedastic Pooled-OLS Regression of Change in Inequality:{Type, Time}

This model explains the change in inequality, using dummy variables for school type (2-6) and year (3-7) of the study. Thus, the baseline model (all dummy variables being false) predicts the change in inequality in female participation from 2005-6 for a 4 year public institution. Each dummy variable explains the effect on change of inequality for that variable relative to the baseline prediction.

Model 3

VariableB CoefficientStd Errort-ratiop-valueAdj. R^2
Jointly (F)7.75E-0230.009877
const−0.01694160.00161506−10.491.24e-25***
DType20.004703580.0006942336.7751.30e-11***
DType30.004845090.01122450.43170.666
DType4−0.003390420.000928965−3.6500.0003***
DType5−0.0004738220.00652504−0.072620.9421
DType60.01771830.0170931.0370.3
dt_30.01504730.002376786.3312.52e-10***
dt_40.01219160.002254845.4076.54e-08***
dt_50.01315170.002159776.0891.17e-09***
dt_60.008880840.002098544.2322.33e-05***
dt_70.014790.001999027.3991.47e-13***

 

Model 4 – Heteroskedastic Pooled-OLS Regression of Change in Inequality:{Class, Time}

This model explains the change in inequality, using dummy variables for school classification (2-16) and year (3-7) of the study. Thus, the baseline model (all dummy variables being false) predicts the change in inequality in female participation from 2005-6 for a a NCAA Division I-A school. Each dummy variable explains the effect on the change in inequality for that variable relative to the baseline prediction.

Model 4

VariableB CoefficientStd Errort-ratiop-valueAdj. R^2
Jointly (F)9.91E-0150.007622
const−0.01635310.00173367−9.4334.73e-21***
DClass27.35073e-050.001147830.064040.9489
DClass30.002413980.001224661.9710.0487**
DClass40.0009672420.001168340.82790.4078
DClass50.0006875430.001254550.5480.5837
DClass60.003076810.0009914423.1030.0019***
DClass70.004906630.001163084.2192.48e-05***
DClass80.0005542830.001698190.32640.7441
DClass90.0005213050.002034310.25630.7978
DClass100.001002410.001668870.60070.5481
DClass110.03253540.009764583.3320.0009***
DClass12−0.003928110.00160661−2.4450.0145**
DClass13−0.002263500.00238427−0.94930.3425
DClass14−0.003148580.00234797−1.3410.18
DClass15−0.0005640280.011868−0.047530.459
Dclass160.003649550.004928180.74050.9621
dt_30.01503680.002374986.3312.52e-10***
dt_40.01217070.002254795.3986.88e-08***
dt_50.01315130.002158046.0941.13e-09***
dt_60.008909350.002097674.2472.18e-05***
dt_70.01479410.001999277.41.46e-13***

 

Model 5 – Heteroskedastic Pooled-OLS Regression of Inequality:{Financial Aid, Recruitment, Coaching, Revenues, Total Expenses, Time}

This model explains the amount of inequality, using dummy variables for year (2-7) and legal variables representing inequality in financial aid, coaching, recruiting, revenue, and total expenses. Thus, the baseline model (all dummy variables being false) predicts the amount of inequality in female participation given a change in a single legal variable in 2005, holding other legal variables constant.

Model 5

VariableB CoefficientStd Errort-ratiop-valueAdj. R^2
Jointly (F)00.71124
const0.06118890.0024096825.391.27e-136***
B_StAid0.3486750.024720414.11.29e-44***
B_RecExp0.02576570.009119892.8250.0047***
B_HSal−0.1160520.0167584−6.9254.71e-12***
B_H0.2672670.024004911.131.42e-28***
B_ASal0.00289560.01185070.24430.807
B_A0.2183710.013864415.754.99e-55***
B_Rev0.005166910.01189740.43430.6641
B_Exp0.05689690.01118825.0853.76e-07***
dt_20.0003557740.001500890.2370.8126
dt_30.0006939460.001676520.41390.6789
dt_40.001214980.00177190.68570.4929
dt_50.004382030.001836862.3860.0171**
dt_6−0.004733490.00231444−2.0450.0409**
dt_7−0.004909260.00237739−2.0650.039**

 

Model 6 – Heteroskedastic Pooled-OLS Regression of Inequality with Interaction Terms:{Financial Aid, Coaching, Time}

This model explains the amount of inequality, using dummy variables for year (2-7) and legal variables representing inequality in financial aid, number of head and assistant coaches, and average head coaching salary. Some variables (average assistant coaching salary, recruitment expenses, revenue, and total expenses) were dropped to focus the four legal variables with best p-value. Eleven interactions terms were added to represent every permutation between those four legal variables. Thus, the baseline model (all dummy variables being false) predicts the amount of inequality in female participation given a change in one or more legal variables in 2005.

Table 3: Interaction Terms

VariableDescription
int1B_StAid*B_HSal
int2B_StAid*B_H
int3B_StAid*B_A
int4B_HSal*B_H
int5B_HSal*B_A
int6B_H*B_A
int7B_StAid*B_HSal*B_H
int8B_StAid*B_HSal*B_A
int9B_StAid*B_H*B_A
int10B_HSal*B_H*B_A
int11B_StAid*B_HSal*B_H*B_A

Model 6

VariableB CoefficientStd Errort-ratiop-valueAdj. R^2
Jointly (F)977.851400.705262
const0.06376660.0019605232.536.77e-219***
B_StAid0.328530.010893830.165.83e-190***
B_HSal−0.08800940.0126341−6.9663.50e-12***
B_H0.4066020.012934631.442.33e-205***
B_A0.2033430.0094624121.498.22e-100***
int10.00406530.09927410.040950.9673
int2−0.09211210.069402−1.3270.1845
int30.2457530.05820414.2222.44e-05***
int4−0.07369280.100802−0.73110.4648
int50.1247640.09183411.3590.1743
int6−0.2869880.0633195−4.5325.91e-06***
int74.356220.5763467.5584.50e-14***
int8−1.246340.482051−2.5850.0097***
int9−0.3152150.174419−1.8070.0708*
int100.344450.4874430.70660.4798
int11−5.990801.77217−3.3800.0007***
dt_2−0.0007899120.00229648−0.34400.7309
dt_3−0.0004968740.00230631−0.21540.8294
dt_40.0004612840.002300910.20050.8411
dt_50.003426860.002301251.4890.1365
dt_60.001066010.002294230.46460.6422
dt_70.0008959850.002287020.39180.6952

 

Model 7 – Heteroskedastic Fixed-Effects Regression of Discrimination:{Financial Aid, Coaching}

This model explains the amount of inequality, using legal variables representing inequality in financial aid/coaching, and using interaction terms for those variables. I eliminated the dummy  variables because a fixed-effects model already subtracts the mean over time from each variable, accounting for time. This allows for less biased and better predictors of ‘B’, the magnitude of the effect on inequality, and the expense of biasing the ‘R^2′, the goodness of fit measure. Thus, the baseline model predicts the amount of inequality in female participation given a change in one or more legal variables in an average time.

Model 7

VariableB CoefficientStd Errort-ratiop-valueAdj. R^2
Jointly (F)58.346940N/A
const0.083110.0022925636.251.77e-264***
B_StAid0.2349930.02522979.3141.60e-20***
B_HSal0.04440770.02580441.7210.0853*
B_H0.4839060.033627814.392.61e-46***
B_A0.07297120.01334585.4684.71e-08***
int1−0.2107490.219233−0.96130.3364
int20.1433780.1256691.1410.2539
int3−0.06359460.0980551−0.64860.5166
int40.08054460.2337310.34460.7304
int5−0.01195290.152096−0.078590.9374
int6−0.06378800.103433−0.61670.5374
int71.75291.078681.6250.6543
int80.4482860.8215160.54570.5853
int9−0.5049560.285613−1.7680.0771*
int10−0.7900760.851951−0.92740.3538
int11−1.477123.29842−0.44780.6543

 

Results: Question 1

  • On average, the amount of inequality in female athletic participation is decreasing over time. (Model 1+2)
  • 4 year public and the NCAA I-A division institutions have the least amount of inequality in female athletic participation relative to other institutions. (Model 1+2)
  • ‘For-profit’ institutions have the greatest amount of inequality relative to 4 year public institutions, with 4 year for-profit institutions having 7.6 more percentage points of a participation gap (statistically significant) and 2 year for-profit institutions having 15.5 more percentage points (although not statistically significant). (Model 1)
  • Five Divisions have 10 more percentage points of a participation gap (statistically significant) relative to NCAA Division I-A: ‘NCAA Division II (with football)’, ‘NAIA Division III’, ‘NJCAA Division I, II, + III’. (Model 2)
  • The goodness of fit measure is quite small for both type (aR^2=.044) and class (aR^2=.109). Although this doesn’t directly bias the model, it does hint at other omitted variables likely having a much greater effect than type, class, and time. (Model 1+2)

Results: Question 2

  • On average, the speed at which inequality in female athletic participation is slowing down over time. (Model 3+4).
  • Only 2 year public institutions have a statistically significant speed at which inequality is changing over time, but the magnitude of the effect is quite small (3/10 of a percentage point). (Model 3)
  • Only ‘NAIA Division III’ institutions have large effect (3 percentage points) on the speed at which inequality is changing over time, and that effect is statistically significant. (Model 4)
  • The goodness of fit measure is quite tiny for both type (aR^2=.0099) and class (aR^2=.0077). Although this doesn’t directly bias the model, the ridiculously low nature of aR^2 pushes us toward alternative models. (Model 3+4)

Results: Question 3

  • Without any inequality in financial aid, head coach salary, and number of head + assistant coaches, there still lurks a constant inequality of 8 percentage points (statistically significant) in female athletic participation. This constant might be interpreted as roughly approximating inherent female “interests and abilities”, although it could also be the result of omitted variables. (Model 7)
  • Inequality in the number of head coaches has the biggest effect on female participation, with a 10 percentage points gap in the inequality of the number of head coaches correlating with a 4.8 percentage point gap in female athletic participation. (Model 7)
  • Inequality in financial aid for athletes has the next largest effect on female participation, with a 10 percentage point gap in financial aid inequality correlating with a 2.3 percentage points gap in female athletic participation. (Model 7)
  • Inequality in the number of assistant coaches and average-head coach salary have smaller effects on female participation, with a 10 percentage point gap in the assistant coaching (or head coach salary) correlating with a 7/10 (or 4/10) of a percentage point gap in female athletic participation. (Model 7)
  • All four explanatory variables and the constant are statistically significant. (Model 7)
  • Only one interaction term is statistically significant, representing the interaction of financial aid, number of head coaches, and number of assistant coaches. A 10 percentage point gap in all three correlates with negative 5/10 of a percentage point gap in female athletic participation, slightly mitigating the compounding effect of all three variables on participation. (Model 7)
  • The goodness of fit measure is biased for a Fixed Effects Regression, but one can approximate the result by calculating AR^2 for the same equation in a Pooled-OLS Regression. That model had a very strong goodness of fit measure (aR^2=.70), inferring a similarly strong fit for the Fixed-Effects Regression. (Model 6+7)

Conclusion

Although the three questions addressed do not make up the exact legal test for the “interests and abilities” factor in Title IX, they do represent a pretty close approximation to the totality of facts and factors that would make up such perfect test. Thus, while imperfect, these models can still be helpful to lawyers, judges, students, and academics alike who are trying to add some objective sense to the subjective Title IX compliance riddle.

Specifically, these models could help:

  1. show the general progress of Title IX over time to help society judge general effectiveness
  2. form a baseline comparison from which one can identify extreme outliers, and then directly target those schools for individual compliance reviews and/or student initiated complaints.
  3. determine a baseline compliance level for schools on average, from which individual efforts by schools could be judged, shedding some light to the oft litigated area of when exactly statistical discrimination becomes large enough to become legal discrimination.
  4. identify relationships between areas of female inequality, allowing schools to prioritize on areas that best effect actual female participation rates
  5. quantify specific legal concerns (e.g. whether schools are complying by decreasing male sports rather than increasing female sports)