Predicting Criminal Recidivism with R
Can data science indicate what factors affect the rate of criminal or violent recidivism? (Hint: Yes)
This post was co-authored by David Pinski, a graduate student at Carnegie Mellon University. Please reach out to me directly if you are interested in the R code used in this report.
Table of Contents:
1. Introduction
Around the United States, municipalities have turned to risk assessment instruments (RAIs) to help judges determine which individuals to release on bail and which ones to keep in custody. The risk assessment process varies based on the specific instrument used but many rely on criminal recidivism data sets. These data sets typically contain various demographic indicators (age, race, gender, etc.) and also criminal history (charges, juvenile record, etc.).
Broward County, Florida, has turned to the use of one of the most popular RAIs today: COMPAS or the Correctional Offender Management Profiling for Alternative Sanctions tool. COMPAS assesses individuals based on criminal history and social profiling to categorize an individual as low, medium, or high risk. This tool, however, was not developed using the Broward County data set which may lead to poor performing predictions for individuals from Broward County, Florida.
In the following data analysis, we apply modern data mining techniques to:
- Construct an RAI using the Broward County data set to predict two-year recidivism.
- Construct an RAI using the Broward County data set to predict two-year violent recidivism.
- Evaluate predictive quality for different ethnicities, ages, and genders.
- Compare our custom RAI to the proprietary COMPAS RAI.
Before constructing the RAIs and comparing our results to COMPAS, we first explore, clean, describe, and interpret the Broward County data set.
2. Data Exploration
Scope of the Raw Data
The data set provided contains records of individuals from Broward County, Florida, who have been convicted with a crime. Columns provided in the data set include:
- ID
- Name
- COMPAS Screening Date
- Sex, Date of Birth, Age, Age Category, Race
- Counts of Juvenile Felonies, Misdemeanors, Other offenses
- Priors count
- Days between Screening and Arrest
- Dates in and out of jail
- Charge Offense Date
- Days from COMPAS screening
- Charge Degree
- Charge Description
- Is Recidivist? (And other values related to the recidivism charge if applicable)
- Is Violent Recidivist? (And other values related to the violent recidivism charge if applicable)
- Dates in and out of custody
- Two-year Recidivist?
- Two-year Violent Recidivist? (This value was calculated manually by multiplying Is Violent Recidivist with Two-Year Recidivist)
- COMPAS Decile Score
Data Cleaning
Columns
Overall, the data set contains 56 columns but some of these columns are unusable or provide little value for one or more of the following reasons:
- Is a unique identification number (individual ID, case/charge number)
- Directly relates to known recidivism (this data is unknown when assessing risk of future recidivism at a bail hearing)
- Directly relates to known violent recidivism (this data is unknown when assessing risk of future recidivism at a bail hearing)
- Reports the type of assessment performed (all cases are ‘Risk of Recidivism’ and ‘Risk of Violence’)
For these reasons, the columns have been ignored and filtered out of the analyzed data set (resulting in 24 columns).
Rows
The data set contains 7214 rows but, similar to the filtering performed by ProPublica, we have filtered out individuals who do not meet certain criteria:
- Individuals with a COMPAS scored crime that has a charge date that was not within 30 days of the arrest date were removed.
- Individuals with no COMPAS case were removed.
- Individuals with a charge degree of ‘O’ (instead of ‘F’ or ’M’) were removed. These individuals are not expected to serve time in jail.
- Individuals with less than two years of time outside of the correctional facility were removed.
For these reasons, the corresponding rows have been removed (resulting in 6172 rows). For more information on the row filtering reasons listed above, please visit ProPublica's source page.
Feature Engineering
To enhance our analysis of the Broward County data set, we believe additional features will aid prediction. Hidden within the data set are additional features/variables that could improve the predictive performance of our models. Below is the list of new features and a brief explanation:
Days spent in jail
While we lose the exact dates of when an individual entered and exited jail, we gain the ability to see if the duration or the term of the charge impacts the recidivism rate. To calcuate this value, we subtract the date the person entered jail from the date the person exited jail. This subtraction provides us with the number of days spent in jail.
Days spent in custody
Similarly, for days spent in custody, we can use this information to determine whether this information is important for predicting criminal recidivism. To calculate this value, we subtract the date the person entered custody from the date the person exited custody. This subtraction provides us with the number of days spent in custody.
Number of juvenile charges (felony, misdemeanor, other)
The original data set provides the number of juvenile charges, separated by type: felony, misdemeanor or other. To analyze the impact of criminal activity from an individual's youth, we summed the counts together. While we may lose the severity of the crime(s), we gain insight into how much juvenile criminal activity, as a whole, feeds into criminal recidivism.
Charge category
For each individual, a description of their charge was provided. This information while useful to an individual reading a police report, is not well suited for data analysis. Data labeled as "Driving License Suspended" would not be considered the same as data labeled as "DWLS Susp/Cancel Revoked" or "Susp Drivers Lic 1st Offense". These may have subtle differences in length of sentence or the size of a fine but provide more value when considered together. In this example, these and other related offenses have been categorized as "Driving/DUI".
We used this opportunity to also further categorize drug-related crimes. For these offenses, we have two high-level groupings: one for cannabis-related offenses, and one for non-cannabis-related offenses (cocaine, methamphetamine, heroin, synthetic drugs, etc.). This grouping and others will allow us to determine whether the type of crime committed impacts the recidivism rate.
Other charge categories include: assault, battery, burglary, resisting, criminal mischief, tampering, and lewdness.
Involved firearm
Since we are tasked with not only predicting general recidivism but also violent recidivism, we chose to engineer a binary variable for 'firearm' or 'deadly weapon' related offenses. The suspicion is that individuals involved with a firearm-related crime will be more likely to recidivate in the future (particularly within two years). Any record containing a description referring to 'firearm', 'deadly weapon', 'throwing missile into vehicle', or other related charges were labeled as '1'; all others were labeled as '0'.
Polynomial transformations
We included second and third degree terms for three continuous variables: "age", "priors_count", and "total_juv_count" (the variable created that sums up all juvenile offenses). These transformations will allow our models to capture any nonlinear effects that these variables have on recidivism and violent recividism.
Descriptive Statistics
To familiarize ourselves with the data set, we evaluate the key variables including the outcome variable. In the following section we describe the data and the distributions for each variable.
Age
The age variable is clearly right-skewed with the majority of individuals in the data set falling between the ages of 20 and 30. The average age is 34.5 years old.
Typically less associated with crime, there are elderly (60 years or older) individuals from Broward County with a criminal record in our data set. On the other end of the age spectrum, there are 0 individuals included below 18. We suspect this phenomenon is because detailed juvenile records are inaccessible.
Questions that are outside of the scope of this analysis but possibly interesting to study include: Do individuals nearing age milestones commit more crimes? Do individuals nearing retirement commit more crimes (relative to individuals a few years further away from retirement)?
Gender
The gender variable also appears unevenly distributed between men and women. The majority of individuals, 81%, in the data set are men.
Sex | Count | Proportion |
---|---|---|
Female | 1175 | 0.19 |
Male | 4997 | 0.81 |
Race
The race variable also appears unevenly distributed. African-American individuals account for 50% of the data set while Asian individuals are only 0%.
Priors Count
Similar to the age variable, the priors count variable is heavily right-skewed. The average number of prior convictions is 3.25 with 34% of individuals having 0 priors.
Juvenile Charges Count
It follows that if individuals exhibit a right-skewed distribution for prior convictions, then they may also exhibit a right-skewed distribution in their juvenile charges. That is the relationship found in the data set for Broward County. The average number of juvenile counts is 0.26 while 87% have 0 juvenile charges.
Days in Jail
The days in jail variable is also heavily right-skewed. The average number of days in jail is 15.11 with 11% of individuals having spent 0 days in jail. The maximum number of days spent in jail is 800.
Days.in.Jail | Count | Proportion |
---|---|---|
0 - 49 Days | 5718 | 0.926 |
50 - 99 Days | 211 | 0.034 |
100+ Days | 243 | 0.039 |
Days in Custody
The days in custody variable is also heavily right-skewed. The average number of days in custody is 35.97 with 11% of individuals having spent 0 days in custody. The maximum number of days spent in custody is 6035.
Days.in.Custody | Count | Proportion |
---|---|---|
0 - 49 Days | 5391 | 0.873 |
50 - 99 Days | 326 | 0.053 |
100+ Days | 455 | 0.074 |
Charge Degree
The charge degree variable indicates that the majority of individuals, 64%, in the data set are charged with a felony as opposed to a misdemeanor.
Charge Degree | Count | Proportion |
---|---|---|
Felony | 3970 | 0.643 |
Misdemeanor | 2202 | 0.357 |
Charge Category
When reviewing the charge category plot, we notice that a large proportion of individuals, 24%, are charged with battery and 13% are not charged at all (only arrested).
Firearm Involvement
Firearm is one of the features that we engineered to capture the nature of recidivism. In the Broward County data set, only 4.1% of individuals are charged with a crime that is described as involving a 'firearm' or 'deadly weapon'.
Involved Firearm | Count | Proportion |
---|---|---|
No Weapon/Firearm | 5922 | 0.959 |
Weapon/Firearm | 250 | 0.041 |
Days Between Screening and Arrest
For days between screening and arrest, we see that the majority of individuals are screened within 0 to 1 days of arrest. There are cases when an individual is screened prior to their arrest.
Days.Between.Screening.and.Arrest | Count | Proportion |
---|---|---|
Screened 1 Day Or More Before Arrest | 69 | 0.011 |
Screened Same Day as Arrest | 1379 | 0.223 |
Screened 1 Day After Arrest | 3980 | 0.645 |
Screened 2 to 5 Days After Arrest | 338 | 0.055 |
Screened 6 or More Days After Arrest | 406 | 0.066 |
Outcome Variables (Two-Year Recidivism & Violent Two-Year Recidivism)
For the outcome variables, we find that the rate of recidivism is higher than the rate of violent recidivism. Notice that the prevalence (baseline) of recidivism is about 46% and the prevalence for violent recidivism is 11%. These figures are important to keep in mind as we evaluate each variable and the performance of our model. For example, if we were to assume every individual does not violently recidivate, then we would have approximately 89% accuracy which is misleading.
Recidivism | Count | Proportion |
---|---|---|
0 | 3363 | 0.545 |
1 | 2809 | 0.455 |
Violent.Recidivism | Count | Proportion |
---|---|---|
0 | 5520 | 0.894 |
1 | 652 | 0.106 |
Variable Impact on Recidivism and Violent Recidivism
Now that we have described the data, we perform cursory visual inspection of the relationship between each variable and our outcomes: general two-year recidivism and violent two-year recidivism.
Age
For the age variable, we observe a decrease in the rate of recidivism as age increases. This relationship occurs for both general recidivism and violent recidivism. Values over 66 years of age were binned as the number of individuals in those age groups is low.
Gender
In both general recidivism and violent recidivism, men tend to recidivate at a higher rate of frequency. Men generally recidivate approximately 36.42% more than women but violently recidivate approximately 75.59% more than women.
Race
We notice that African-American individuals have the highest rate of general and violent recidivism. Asians observe a relative spike (compared to general recidivism, relative to other races) in violent recidivism but we suspect this is due to the small sample size.
Priors Count
The Conditional Density Plot for Recidivism and Priors Count reveals an expected relationship. The number of prior convictions influences the frequency of recidivism. Looking only at this variable, recidivism is more frequent if the individual has a longer history of convictions. The values for general recidivism were binned above 25 charges as the number of individuals with 25+ prior convictions is low. The values for violent recidivism were binned above 14 charges as the number of individuals with 14+ prior convictions is low.
Juvenile Charges Count
Similar to what is revealed when evaluating Priors Count, the number of Juvenile charges also exhibits a non-linear, non-random relationship. As an individual has more juvenile charges, the likelihood of recidivism increases. The values were binned above 5 charges for both general and violent recidivism as the number of individuals with 5+ juvenile charges is low.
Days in Jail
The conditional density plots for both general and violent recidivism versus days in jail are shown below. There is a hint at increasing likelihood of both types of recidivism, however, we remain skeptical of these plots given the lack of observations above certain numbers of days in jail. We binned observations above 200 and 120 days in jail, respectively, due to the lack of observations.
Days in Custody
Similarly for days in custody, the conditional density plots for both general and violent recidivism are shown below. Again, there is a hint at increasing likelihood of both types of recidivism, however, we remain skeptical for the same reasons as days in jail. We binned observations above 400 days in custody in both plots due to the lack of observations.
Charge Degree
Since charge degree is categorical, we construct a conditional frequency plot to show how recidivism rates change if charged with a felony versus a misdemeanor. It is clear that individuals charged with a felony in our data set recidivated more frequently (50% versus 37.5%). For violent recidivism, however, the difference is less clear. The individuals with a felony charge recidivated at about 10.7% compared to 10.4% for indivdiuals who were charged with a misdemeanor. This change is an unintuitive finding.
Charge Category
A few striking observations for charge category are presented below. The assault and battery categories, relative to the other categories, move up in the ranking of violent recidivism. This relationship is expected. Tampering is consistently the top category as these individuals already display a disregard for legal procedures. Lastly, individuals who resist with violence see a relative jump when violently recidivating.
Firearm Involvement
Firearm involvement indicates that there may be some value in including this predictor - especially for identifying violent recidivism. So far, this matches our intuition as individuals involved in crimes with deadly weapons may be predisposed to further violence (e.g., retaliation, gang-related activity).
Days Between Screening and Arrest
For days between screening and arrest, there is a subtle hint at increasing likelihood of both general and violent recividism as the time taken to perform the screening after arrest (positive values) increases. The relationship for the violent recidivism plot exhibits less consistency but still has an upward trend.
While reviewing the data presented above is helpful, we may over-interpret a plot, miss certain variable interactions, and still cannot classify individuals as likely general or violent two-year recidivists. However, now that we understand the data, we are ready to explore different techniques for constructing an RAI or classifier.
3. Analysis & Methodology
Data Balancing
Given that only about 10% of individuals in the data violently recidivated, we were worried that this number may be too low to produce a strong RAI. To remedy this, we tried upsampling the data by adding copies of individuals who violently recidivated. Our hope was that this would lead to a stronger RAI than what we would achieve by just using training data with the base rate of violent recidivism.
Unfortunately, this upsampling procedure led to some biased results - our classifiers for violent recidivism, when trained on the upsampled data, resulted in extremely high measures of AUC, accuracy, sensitivity, and specificity. This was not the case when our classifier was training on the non-upsampled data. We tried upsampling a smaller number of cases to reduce this bias, but were unsuccessful. After explaining the situation to our advisor, we were unable to determine what issue was leading to these inflated results. As a result, we decided to not upsample the data.
Cursory Variable Selection
We have developed an understanding of the Broward County data set and look to select variables that may be impactful in our models/RAIs. Using two variable selection methods, we look to explore which variables may be more important than others. We use these methods to uncover other relationships in the data but are not bound by the results.
Best Subset Selection
General Recidivism
Variable | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | Count |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(Intercept) | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 15 |
priors_count | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 15 |
age | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 14 | |
priors_count_2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 13 | ||
age_2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | |||
priors_count_3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | ||||
days_b_screening_arrest | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 10 | |||||
days_in_custody | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 9 | ||||||
c_categoryArrest, No Charge | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 8 | |||||||
age_3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 7 | ||||||||
c_categoryNon-Cannabis Drug | 1 | 1 | 1 | 1 | 1 | 1 | 6 | |||||||||
sexMale | 1 | 1 | 1 | 1 | 1 | 5 | ||||||||||
c_categoryTheft/Robbery | 1 | 1 | 1 | 3 | ||||||||||||
c_categoryTampering | 1 | 1 | 1 | 3 | ||||||||||||
days_in_jail | 1 | 1 | 2 | |||||||||||||
c_categoryDriving/DUI | 1 | 1 |
Violent Recidivism
Variable | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | Count |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(Intercept) | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 15 |
priors_count | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 15 |
age | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 14 | |
c_categoryBattery | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 13 | ||
priors_count_2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | |||
priors_count_3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | ||||
total_juv_count | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 10 | |||||
days_b_screening_arrest | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 9 | ||||||
sexMale | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 8 | |||||||
days_in_custody | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 7 | ||||||||
c_firearmWeapon/Firearm | 1 | 1 | 1 | 1 | 1 | 1 | 6 | |||||||||
days_in_jail | 1 | 1 | 1 | 1 | 1 | 5 | ||||||||||
age_2 | 1 | 1 | 1 | 1 | 4 | |||||||||||
c_categoryDriving/DUI | 1 | 1 | 1 | 3 | ||||||||||||
c_categoryTampering | 1 | 1 | 2 | |||||||||||||
raceHispanic | 1 | 1 |
We used best subset selection (with the maximum variables set at 15) to get a better idea of which variables will be useful in our models, as well as to see how valuable the insertion of additional variables is according to the AIC measure. The AIC measure adjusts for model complexity, which is why simply going by the R-squared plots in determining model size is not the best idea, since R-squared will monotonically increase as more variables are added. An important takeaway is that since best subset selection selected the age variable earlier than the age_cat variable, we chose to use the age variable rather than the age_cat variable in our modeling. Other important findings are that some of the variables that were feature-engineered appear to be important: many of the created categories were selected, with the "Non-cannabis Drug" category being selected 10th in the model. The days_in_custody and days_in_jail variables were selected 7th and 14th, respectively. Total_juv_count was not selected in the top 15 variables. Race, contrary to intuition, was not selected early on. Notice that "Driving/DUI" was selected in model size 12 but then dropped from all larger models.
For violent recidivism, priors_count and age were selected 1st and 2nd overall (same as for general recidivism). The "Battery" category, however, was selected 3rd overall suggesting strong predictive power. Total_juv_count was selected 6th overall for violent recidivism, potentially indicating violent recidivism may be tied to juvenile behavior.
One concern with best subset selection is the unforgiving inclusion of an entire variable's coefficient. Each variable at a given model size is either entirely in the model or entirely out (this behavior is by design for best subset selection). The next, more advanced, method attempts to overcome that issue.
Lasso
General Recidivism
Variable | Coefficient |
---|---|
(Intercept) | 0.1414751 |
age | -0.0340943 |
age_2 | 0.0000000 |
age_3 | 0.0000000 |
priors_count | 0.2034434 |
priors_count_2 | -0.0033158 |
priors_count_3 | 0.0000000 |
days_b_screening_arrest | 0.0200651 |
days_in_jail | 0.0013143 |
days_in_custody | 0.0010026 |
total_juv_count | 0.0699668 |
total_juv_count_2 | 0.0000000 |
total_juv_count_3 | 0.0000000 |
sexMale | 0.2598741 |
c_charge_degreeMisdemeanor | -0.0708876 |
raceAsian | -0.1796843 |
raceCaucasian | 0.0000000 |
raceHispanic | -0.0842748 |
raceNative.American | 0.0000000 |
raceOther | -0.0726389 |
c_categoryNon.Cannabis.Drug | 0.2581167 |
c_categoryArrest..No.Charge | -0.2010029 |
c_categoryDriving.DUI | -0.1135437 |
c_categoryOther | 0.0743263 |
c_categoryTheft.Robbery | 0.1551974 |
c_categoryBurglary | 0.0000000 |
c_categoryCannabis | -0.0623503 |
c_categoryAssault | -0.0681680 |
c_categoryTampering | 0.3613655 |
c_categoryCriminal.Mischief | 0.1519577 |
c_categoryResist.with.Violence | 0.0000000 |
c_categoryResist.w.o.Violence | 0.0000000 |
c_categoryLewdness.Sexual.Misconduct | -0.1543073 |
c_categoryTrespassing | 0.0000000 |
c_firearmWeapon.Firearm | 0.0000000 |
age_catGreater.than.45 | 0.0000000 |
age_catLess.than.25 | 0.3075347 |
Violent Recidivism
Variable | Coefficient |
---|---|
(Intercept) | -1.8877751 |
age | -0.0163187 |
age_2 | 0.0000000 |
age_3 | 0.0000000 |
priors_count | 0.0417822 |
priors_count_2 | 0.0000000 |
priors_count_3 | 0.0000000 |
days_b_screening_arrest | 0.0007969 |
days_in_jail | 0.0004382 |
days_in_custody | 0.0000815 |
total_juv_count | 0.0649113 |
total_juv_count_2 | 0.0000000 |
total_juv_count_3 | 0.0000000 |
sexMale | 0.0877513 |
c_charge_degreeMisdemeanor | 0.0000000 |
raceAsian | 0.0000000 |
raceCaucasian | 0.0000000 |
raceHispanic | 0.0000000 |
raceNative.American | 0.0000000 |
raceOther | 0.0000000 |
c_categoryAssault | 0.0000000 |
c_categoryBattery | 0.2320494 |
c_categoryBurglary | 0.0000000 |
c_categoryCannabis | 0.0000000 |
c_categoryCriminal.Mischief | 0.0000000 |
c_categoryDriving.DUI | -0.0720887 |
c_categoryLewdness.Sexual.Misconduct | 0.0000000 |
c_categoryNon.Cannabis.Drug | 0.0000000 |
c_categoryOther | 0.0000000 |
c_categoryResist.w.o.Violence | 0.0000000 |
c_categoryResist.with.Violence | 0.0000000 |
c_categoryTampering | 0.0000000 |
c_categoryTheft.Robbery | 0.0000000 |
c_categoryTrespassing | 0.0000000 |
c_firearmWeapon.Firearm | 0.0000000 |
age_catGreater.than.45 | 0.0000000 |
age_catLess.than.25 | 0.0000000 |
We also fit a lasso as part of the variable selection process. According to the "1-SE" rule, the most important variables in determining recidivism included sex, priors_count, the "Non-Cannabis Drug" charge category, and the "Less than 25" age category. Non-zero coefficients were also found for other charge categories. There is general agreement between the lasso and best subset selection: both models found mostly the same variables to be important.
For violent recidivism, only 10 variables had non-zero coefficients with the "Battery" charge category having the most impact.
In general, the results from our variable selection procedures mostly served to inform us, rather than restrict which variables we included in future models. Now that we have an idea of how the variables predict relative to each other, we are ready to construct and test our models.
RAI/Classifier Construction & Model Performance Evaluation
Our task is to develop an RAI to determine whether an individual will recidivate within 2 years. In order to accomplish this we must use the data to construct a model that classifies individuals into yes or no groupings. In the section that follows, we use our knowledge and intuition of the data set, our understanding of the variables and available classification techniques to construct and test our custom RAI for both general and violent recidivism.
For each model, we display charts showing how performance metrics change depending on the cutoff that was used in the model. We also display ROC and Precision-Recall graphs. For each model, below the graphs we explain the the graphs and tables represent.
Logistic Regression
General Recidivism
Cutoff | Misclassification Rate | Sensitivity | Specificity |
---|---|---|---|
0.25 | 0.411 | 0.901 | 0.329 |
0.30 | 0.381 | 0.858 | 0.420 |
0.35 | 0.353 | 0.805 | 0.515 |
0.40 | 0.332 | 0.749 | 0.602 |
0.45 | 0.315 | 0.686 | 0.684 |
0.50 | 0.315 | 0.601 | 0.756 |
0.55 | 0.315 | 0.521 | 0.820 |
0.60 | 0.322 | 0.455 | 0.865 |
0.65 | 0.336 | 0.379 | 0.903 |
0.70 | 0.357 | 0.292 | 0.936 |
0.75 | 0.381 | 0.208 | 0.962 |
Violent Recidivism
Cutoff | Misclassification Rate | Sensitivity | Specificity |
---|---|---|---|
0.25 | 0.130 | 0.131 | 0.957 |
0.30 | 0.117 | 0.061 | 0.981 |
0.35 | 0.110 | 0.033 | 0.991 |
0.40 | 0.106 | 0.019 | 0.997 |
NA | NA | NA | NA |
NA | NA | NA | NA |
NA | NA | NA | NA |
NA | NA | NA | NA |
NA | NA | NA | NA |
NA | NA | NA | NA |
NA | NA | NA | NA |
Logistic regression was the first method we used. It is commonly used for binary classification tasks, so we considered it apt for this problem. For general recidivism, the area under the ROC curve indicates that the model will rank an individual who recidividated higher than one who did not recidivate approximately 74.62% of the time. The precision-recall curve shows a steady decline in precision as recall increases. The sensitivity at fifty percent specificity was 82.78% while the specificity at fifty percent specificity was 83.54%.
For violent recidivism, the area under the ROC curve indicates that the model will rank an individual who violently recidividated higher than one who did not violently recidivate approximately 65.72% of the time. From the precision-recall curve, we see that once approximately 10% recall is reached, precision stays constant at about 40%. The sensitivity at fifty percent specificity was 75% while the specificity at fifty percent sensitivity was 73.65%. The NAs that show up in the table indicate that at our model did not assign any observation a probability of violently recidivating over 45%.
We see that our model for violent recidivism performs worse than the model for general recidivism: it has a lower AUC and also a less attractive sensitivity-specificity tradeoff. From these first models, it seems clear that violent recidivism presents a harder classification task than general recidivism.
Linear Discriminant Analysis (LDA)
General Recidivism
Cutoff | Misclassification Rate | Sensitivity | Specificity |
---|---|---|---|
0.25 | 0.402 | 0.894 | 0.350 |
0.30 | 0.373 | 0.849 | 0.442 |
0.35 | 0.352 | 0.795 | 0.525 |
0.40 | 0.331 | 0.738 | 0.612 |
0.45 | 0.316 | 0.678 | 0.690 |
0.50 | 0.315 | 0.599 | 0.757 |
0.55 | 0.314 | 0.529 | 0.817 |
0.60 | 0.322 | 0.457 | 0.864 |
0.65 | 0.335 | 0.386 | 0.898 |
0.70 | 0.353 | 0.305 | 0.934 |
0.75 | 0.377 | 0.223 | 0.957 |
Violent Recidivism
Cutoff | Misclassification Rate | Sensitivity | Specificity |
---|---|---|---|
0.25 | 0.135 | 0.146 | 0.949 |
0.30 | 0.122 | 0.092 | 0.972 |
0.35 | 0.114 | 0.052 | 0.984 |
0.40 | 0.112 | 0.024 | 0.990 |
0.45 | 0.110 | 0.016 | 0.994 |
0.50 | 0.109 | 0.012 | 0.995 |
NA | NA | NA | NA |
NA | NA | NA | NA |
NA | NA | NA | NA |
NA | NA | NA | NA |
NA | NA | NA | NA |
To incorporate the possibility of interaction effects into our model, we performed linear discriminant analysis (LDA). For general recidivism, the area under the ROC curve indicates that the model will rank an individual who recidividated higher than one who did not recidivate approximately 70.84% of the time. Similar to logistic regression, the precision-recall curve shows a steady decline in precision as recall increases. The sensitivity at fifty percent specificity was 74.5% while the specificity at fifty percent specificity was 81.88%.
For violent recidivism, the area under the ROC curve indicates that the model will rank an individual who violently recidividated higher than one who did not violently recidivate approximately 68.37% of the time. From the precision-recall curve, we see that once approximately 20% recall is reached, precision begins to steadily decrease. The sensitivity at fifty percent specificity was 79.17% while the specificity at fifty percent specificity was 77.11%. The NAs that show up in the table indicate that at our model did not assign any observation a probability of violently recidivating over 55%.
Similarly to our logistic model, the LDA model for violent recidivism performs worse than the model for general recidivism. Both models perform relatively similar to each other: the LDA model has a slightly higher AUC for classifying general recividism, while the logistic model has a slightly higher AUC for classifying violent recidivism. We also attempted QDA, but do not report the results because QDA's performance was worse than LDA across the board.
Classification Tree
General Recidivism
Cutoff | Misclassification Rate | Sensitivity | Specificity |
---|---|---|---|
0.25 | 0.5300 | 0.9983 | 0.0136 |
0.30 | 0.3639 | 0.6311 | 0.6405 |
0.35 | 0.3639 | 0.6311 | 0.6405 |
0.40 | 0.3412 | 0.5752 | 0.7311 |
0.45 | 0.3412 | 0.5752 | 0.7311 |
0.50 | 0.3412 | 0.5752 | 0.7311 |
0.55 | 0.3412 | 0.5752 | 0.7311 |
0.60 | 0.3541 | 0.4021 | 0.8565 |
0.65 | 0.3541 | 0.4021 | 0.8565 |
0.70 | 0.3639 | 0.3479 | 0.8852 |
0.75 | 0.3882 | 0.2640 | 0.9124 |
Violent Recidivism
Cutoff | Misclassification Rate | Sensitivity | Specificity |
---|---|---|---|
0.25 | 0.1078 | 0.0252 | 0.9848 |
0.30 | 0.1078 | 0.0252 | 0.9848 |
0.35 | 0.1078 | 0.0252 | 0.9848 |
0.40 | 0.1062 | 0.0168 | 0.9874 |
0.45 | 0.1062 | 0.0168 | 0.9874 |
0.50 | 0.1062 | 0.0168 | 0.9874 |
0.55 | 0.1062 | 0.0168 | 0.9874 |
0.60 | 0.1062 | 0.0168 | 0.9874 |
0.65 | 0.1062 | 0.0168 | 0.9874 |
0.70 | 0.1062 | 0.0168 | 0.9874 |
0.75 | 0.1029 | 0.0084 | 0.9919 |
The third method we fit was a classification tree. Tree methods differ from the other two we used in that they are easier to interpret, due to the fact that one can follow subsequent splits for a given test observation to arrive at a classification. They can also model interactions (if the depth is greater than 1), just as LDA.
For general recidivism, the area under the ROC curve indicates that the model will rank an individual who recidividated higher than one who did not recidivate approximately 66.97% of the time. The precision-recall curve actually shows an increase in precision as sensitivity increases up to 0.4: it is good to have both these values at high levels, so it would be prudent to not consider cutoffs that yield sensitivities below 0.4. The sensitivity at fifty percent specificity was 63.11% while the specificity at fifty percent specificity was 75.68%.
For violent recidivism, the area under the ROC curve indicates that the model will rank an individual who violently recidividated higher than one who did not violently recidivate approximately 56.97% of the time. The precision-recall curve has a sharp decrease in precision as sensitivity increases from 0 to 5%, followed by a more gradual decrease in precision as sensitivity increases from 5% to 70%, and finally decreasing sharply once again after 70% sensitivity. Thus, a cutoff yielding 5%-70% sensitivity appears optimal considering the precision-recall tradeoff. The sensitivity at fifty percent specificity was 36.13% while the specificity at fifty percent specificity was 78.57%.
Our tree classifier performed worse on classifying cases of violent recidivism than classifying cases of general recidivism. Our tree classifier also performed significantly worse overall than our LDA and logistic models for classifying both types of recidivism.
Random Forest
General Recidivism
Cutoff | Misclassification Rate | Sensitivity | Specificity |
---|---|---|---|
0.25 | 0.3947 | 0.8766 | 0.3865 |
0.30 | 0.3736 | 0.8367 | 0.4568 |
0.35 | 0.3420 | 0.8058 | 0.5388 |
0.40 | 0.3193 | 0.7550 | 0.6208 |
0.45 | 0.2982 | 0.7060 | 0.6984 |
0.50 | 0.2974 | 0.6461 | 0.7482 |
0.55 | 0.3039 | 0.5771 | 0.7921 |
0.60 | 0.3128 | 0.5009 | 0.8375 |
0.65 | 0.3306 | 0.4229 | 0.8682 |
0.70 | 0.3476 | 0.3466 | 0.8990 |
0.75 | 0.3614 | 0.2686 | 0.9370 |
Violent Recidivism
Cutoff | Misclassification Rate | Sensitivity | Specificity |
---|---|---|---|
0.05 | 0.5446 | 0.8571 | 0.4040 |
0.10 | 0.3963 | 0.7000 | 0.5914 |
0.15 | 0.3088 | 0.5714 | 0.7066 |
0.20 | 0.2374 | 0.4357 | 0.8044 |
0.25 | 0.1767 | 0.3571 | 0.8830 |
0.30 | 0.1483 | 0.2571 | 0.9278 |
0.35 | 0.1305 | 0.1500 | 0.9616 |
0.40 | 0.1191 | 0.1000 | 0.9808 |
0.45 | 0.1207 | 0.0429 | 0.9863 |
0.50 | 0.1183 | 0.0286 | 0.9909 |
NA | NA | NA | NA |
The last model we fit and report on was a random forest. Random forests represent an improvement over individual classification trees, because they average over many individual trees, resulting in a classifier with lower variance. We expected the random forest to perform best out of all our models in both classifying cases of general recidivism and classifying cases of violent recidivism. For both violent and general recidvism, we fit 500 trees, with 4 randomly selected variables considered at each split.
For general recidivism, the area under the ROC curve indicates that the model will rank an individual who recidividated higher than one who did not recidivate approximately 74.57% of the time. We see from the precision-recall curve that after the initial sharp decrease in precision, our precision decreases as sensitivity increases. The sensitivity at fifty percent specificity was 82.21% while the specificity at fifty percent sensitivity was 83.75%. The OOB error was approximately 30.82.
For violent recidivism, the area under the ROC curve indicates that the model will rank an individual who violently recidividated higher than one who did not violently recidivate approximately 70.14% of the time. The precision-recall curve slightly decreases as sensitivity increases above 20%. The sensitivity at fifty percent specificity was 77.14% while the specificity at fifty percent specificity was 76.23% - essentially, our random forest for classifying cases of violent recidivism performs extremely well. The OOB error was approximately 10.67.
Just as in our tree classifier, the random forest classifier for violent recidivism does much better than we had hoped - not only performing better than our random forest classifier for general recidivism, but performing with near perfect accuracy while maintaining high levels of sensitivity and specificity.
Gradient Boosted Trees (XGBoost)
As part of our analysis for general recidivism, we also created a sparse matrix to fit a 10-fold cross-validated gradient boosted classification tree model using the popular XGBoost library. We have omitted the results as they were not compelling enough to show in comparison to the random forest classifier. The AUC for the boosted tree model was approximately 0.720 which was less than that of our random forest and it also had an OOB error a few percent higher. Other metrics for the XGBoosted tree model were also lacking compared to the random forest.
Importance of Cross-Validation
While constructing these models we rely on cross-validation, a process which estimates prediction error not against training data but against unseen test data. This process allows us to reduce the risk of over-fitting our model.
For the logistic and LDA models, we performed 10-fold cross-validation manually. For the classification trees, we specified the 'xval' parameter as 10 to perform the 10-fold cross validation. For the random forest models, we used a hold-out test set, and also relied on OOB error as a good proxy for estimates of test error. However, we also used the 'rfcv' function from the RandomForest library to provide test error estimates as the number of features were gradually decreased. The test error produced with this function is highly comparable to the OOB error of the non-cross-validated random forest, suggesting that the non-rfcv random forest model will generalize well to outside test sets.
Unequal Costs for False Positives & False Negatives
Unlike other problems in the prediction or classification space, with criminal recidivism the unequal costs of a false positive and a false negative impact the decision-making and model selection process. We cannot rely on accuracy (nor misclassifcation rate) as a model-selecting metric. In this problem setting, a false positive equates to an individual unnecessarily being denied bail and held in custody (when the likelihood of recidivating is low). A false negative equates to an individual being released on bail when the probability of recidivating is high. Furthermore there are different costs associated with general recidivism versus violent recidivism that must also be considered. The act of violent recidivism may have irreversible consequences for the victims.
Given these uneven costs, for general recidivism we deemed the case of unnecessarily denying an individual bail as having a heavier consequence than letting someone who may recidivate out on bail. However, for violent recidivism we value a slightly more conservative approach. In other words, for general recidivism, we consider false positives to be significantly costlier than false negatives; for violent recidivism, we consider the costs to be more balanced, and even entertain the possiblity that false negatives are costlier (i.e., failing to flag someone who violently recidivates is worse than accidentally flagging someone who doesn't violently recidivate).
Model Comparison & Final Selection
General Recidivism
Model | AUC | Specificity at 50% Sensitivity | Precision at 50% Recall |
---|---|---|---|
Logistic | 0.746 | 0.835 | 0.717 |
LDA | 0.708 | 0.819 | 0.697 |
Decision Tree | 0.670 | 0.757 | 0.632 |
Random Forest | 0.746 | 0.837 | 0.720 |
Violent Recidivism
Model | AUC | Specificity at 50% Sensitivity | Precision at 50% Recall |
---|---|---|---|
Logistic | 0.657 | 0.736 | 0.613 |
LDA | 0.684 | 0.771 | 0.646 |
Decision Tree | 0.570 | 0.786 | 0.661 |
Random Forest | 0.701 | 0.762 | 0.637 |
We selected a random forest classifier for both general and violent recidivism based on the performance of the models and the tradeoffs between the associated costs of false positives and false negatives.
For general recidivism, the decision was close between random forest and logistic regression. We chose the random forest classifier because:
- The random forest produces an AUC value comparable to the logistic regression: 74.57 and 74.62 respectively.
- We prefer a classifer with high specificity (at the expense of sensitivity) as this minimizes individuals denied bail. The random forest has a specificity at fifty percent sensitivity of 83.75 while the logistic regression has one of 83.54.
- The confidence band around the ROC curve for the random forest is narrower than that of the logistic regression, especially near our cutoffs.
- While we lose some interpretability with the random forest, compared to the logistic regression, as we explain in the next section we can extract some valuable variable information.
- Given the costs of false positives and false negatives discussed earlier, we believe a cutoff of 0.55 (that is, classifying any observation with a score above 0.55 as recidiviating) would be apt for this model.
For violent recidivism, the decision fell between random forest and LDA. We chose random forest because:
- It produces a higher AUC value compared to the LDA model: 70.14 and 68.37 respectively.
- For violent recidivism we are willing to trade a little specificity because the cost of a false negative with violent recidivism is higher.
- While the metrics are similar and LDA has higher specificity at fifty percent sensitivity and higher precision at fifty percent recall than the random forest, the confidence bands for LDA are visibly wider than the random forest's.
- Neither method is known for its interpretability so little advantage is gained by either model (although we can salvage some variable insight from random forest, see next section).
- Given the costs of false positives and false negatives discussed earlier, we believe a cutoff of 0.12 (that is, classifying any observation with a score above 0.12 as violently recidiviating) would be apt for this model.
4. Findings
In this section, we discuss the findings of our analysis including important predictors for general and violent recidivism, how our predictive accuracy varies across races, ages, and genders, and how our custom RAIs compare to the COMPAS RAI.
Important Predictors of Recidivism
When fitting or "growing" a random forest, each variable's importance is calculated using MeanDecreaseGini - a measure of how much effect a variable has on the purity of a node within the trees. This information can be used to create a Variable Importance Plot where the relative importance or MeanDecreaseGini is plotted for each variable. Additionally, the random forest provides information on the relationship between each variable and the outcome, in this case each variable's influence on recidivism.
Visual inspection of the variable importance plot reveals that days in custody, charge category, and days in jail are the 3 most influential variables - all of these feature engineered. The higher MeanDecreaseGini indicates more influence. Additionally we see age and priors count as having influence on the outcome variable.
But how do these input variables relate to the outcome? With linear or logistic regression we are provided with the effect of each predictor on the predicted units or the log odds of recidivism. Using the partial dependency plots for the top 5 important variables, we can see the general effect of each variable on the outcome. For example, as either days in custody, days in jail, or priors count increase they positively influence the probability of recidivism. For age, we see the opposite affect - increases in tend to pull down your probability of recidivism. For the categorical variable, charge category, instead of a line plot we have a bar chart with each category's effect indicated. Notice how "Battery" and "Driving/DUI" offenses reduce the probability of recidivism - we suspect this is because these may indicate one-off crimes whereas tampering reveals a deep-rooted disrepect of the rule of law.
To test one of our key feature engineered variables, charge category, we fit a random forest model with the charge category omitted. This random forest had a similar variable importance plot where the key difference was the swapped positions of charge degree and gender. We expect that our variable not only absorbs a portion of the effect from charge degree but provides more granular refinement for the trees. The AUC for the category-less random forest decreased to approximately 0.730. Ultimately, we decided to keep the category variable in our model as it should generalize well to unseen data - although a legal expert should review the categories.
Important Predictors of Violent Recidivism
Similarly for violent recidivism we can also analyze the random forest's variable information. The variable importance and key partial dependency plots are displayed below.
The important variables for violent recidivism differ only slightly from the key variables for general recidivism. The top variables that decreased the Gini Coefficient were: days in custody, charge category, days in jail, age, and days between screening and arrest.
We see that as days in custody increases, one is more likely to violently recidivate. The curve rises quickly here because most of the data is in the range less than 500 (the small tick marks along the X-axis indicate the decile groupings). A few charge categories stand out as leading to lower likelihood of recidivating: "Arrest, No Charge", "Other", and "Tampering" among others. For the days in jail partial dependency plot we see a similar trend compared to days in custody - longer duration equating to higher probability of violent recidivism. The age partial dependency plot illustrates that younger individuals recidivate more but the likelihood of violently recidivating decreases until one reaches middle age. Then, the likelihood begins to increase again around age 50. Notice the distribution of deciles is right-skewed (more younger individuals). For days between screening and arrest, we see that as the days increases (either before or after arrest), the likelihood of violent recidivism increases. Again, notice the decile markers are clustered about zero indicating the relationship is less clear for values further from zero.
Predictive Quality Across Race, Age, and Gender for General Recidivism
While we selected our final model based on its overall performance and test error, one question of interest is whether our model's general recidivism classifications differ significantly across particular races, ages, or genders. If our model's classifications were to differ across groups, this would indicate that our model may be biased against a particular group.
Race
To more effectively compare race, we binned together three race categories: "Asian", "Native American", and "Other" because the number of observations in the first two categories were too small and would not have produced good estimates of test error for those races.
Race | Accuracy | False Negative Rate | False Positive Rate | PPV | NPV |
---|---|---|---|---|---|
African-American | 0.683 | 0.367 | 0.263 | 0.725 | 0.647 |
Caucasian | 0.715 | 0.497 | 0.160 | 0.648 | 0.743 |
Hispanic | 0.723 | 0.467 | 0.188 | 0.571 | 0.788 |
Other | 0.667 | 0.654 | 0.140 | 0.600 | 0.685 |
In the above table we display the accuracy, false negative rate, false positive rate, positive predictive value (PPV), and negative predictive power (NPV). While accuracy is lower for African-Americans, the false negative rate is the lowest of all of the races. This indicates African-Americans are less likely to be awarded bail if they were going to recidivate. Additionally, the African-American false positive rate is the highest of the categories which indicates they are more likely to be incorrectly determined to recidivate (and denied bail) than the other races.
Age
Age | Accuracy | False Negative Rate | False Positive Rate | PPV | NPV |
---|---|---|---|---|---|
Less than 25 | 0.672 | 0.331 | 0.324 | 0.746 | 0.589 |
25 - 45 | 0.676 | 0.442 | 0.229 | 0.662 | 0.685 |
Greater than 45 | 0.768 | 0.517 | 0.099 | 0.694 | 0.789 |
In the table above, a few key inconsistencies in prediction across age groups are presented. First, the false negative rate varies considerably for individuals less than 25 years old and individuals over 45 years old - individuals under 25 years old are approximately 33.12% likely to recidivate when predicted not to recidivate while individuals over 45 are approximately 51.69% likely to recividate when predicted not to recidivate. Second, when reviewing false positive rate, individuals under 25 years old are considerably more likely to be incorrectly deemed to recidivate, when they do not (32.41%). Meanwhile individuals over 45 years of age are less likely to be denied bail incorrectly (9.95%).
Gender
Gender | Accuracy | False Negative Rate | False Positive Rate | PPV | NPV |
---|---|---|---|---|---|
Female | 0.769 | 0.883 | 0.520 | 0.672 | 0.800 |
Male | 0.679 | 0.763 | 0.586 | 0.694 | 0.668 |
Reviewing gender, we can see more discrepancies in the model's performance. First, women who recidivate are more likely to be classified as non-recidivists than men who would recidivate (given by the false negative rate). Second, the NPV for women is higher than that of men, revealing that our model is better at predicting women who won't recidivate than men who won't recidivate.
Predictive Quality Across Race, Age, and Gender for Violent Recidivism
While we selected our final model based on its overall performance and test error, one question of interest is whether our model's violent recidivism classifications differ significantly across particular races, ages, or genders. If our model's classifications were to differ across groups, this would indicate that our model may be biased against a particular group.
Race (same binning as above)
Race | Accuracy | False Negative Rate | False Positive Rate | PPV | NPV |
---|---|---|---|---|---|
African-American | 0.584 | 0.678 | 0.375 | 0.118 | 0.856 |
Caucasian | 0.583 | 0.765 | 0.384 | 0.055 | 0.895 |
Hispanic | 0.571 | 0.857 | 0.398 | 0.025 | 0.908 |
Other | 0.596 | 0.500 | 0.390 | 0.167 | 0.887 |
The table above reveals that our classifier had similar accuracy across races. The false positive rates are similar across races, revealing that no race is more likely than another to be incorrectly classified as violently recidivating. The false negative rate, however, does differ significantly across races: between Caucasians and African-Americans, Caucasians who violently recidivate are more likely to be incorrectly classified than African-Americans who violently recidivate. This benefits Caucasians, because they are more likely to get "off the hook". It should be noted that the PPV for African-Americans is higher than that of Caucasians, indicating that African-Americans who are classified as violently recidivating by our model do violently recidivate more than Caucasians who are classified as violently recidivating.
Age
Age | Accuracy | False Negative Rate | False Positive Rate | PPV | NPV |
---|---|---|---|---|---|
Less than 25 | 0.560 | 0.833 | 0.787 | 0.126 | 0.851 |
25 - 45 | 0.588 | 0.932 | 0.926 | 0.093 | 0.862 |
Greater than 45 | 0.594 | 0.625 | 0.803 | 0.061 | 0.935 |
Our model performs equally well in terms of accuracy across the three age categories present in the data. The false negative rate for the "25-45" age category is higher than that of the other two categories, indicating that individuals in the "25-45" category who will violently recidivate are more likely to get away with it. The false positive rates across the categories are relatively similar. The PPV for the "Less than 25" category is higher than for the other two categories, which suggests that individuals who are classified as recidivating are more likely to actually do so if they are in this category relative to the other categories. Given these results, our model appears slightly biased against young people and old people.
Gender
Gender | Accuracy | False Negative Rate | False Positive Rate | PPV | NPV |
---|---|---|---|---|---|
Female | 0.571 | 0.773 | 0.396 | 0.052 | 0.891 |
Male | 0.587 | 0.678 | 0.377 | 0.104 | 0.871 |
Across genders, our model had almost identical accuracy. Females who violently recidivate get away with it more often than males, as indicated by their false negative rate. The false positive rates and NPVs are very similar for both genders. Our model does a better job of classifying violent recidivism for males than females, as shown by the fact that the PPV for males is about twice that of females.
Comparing Our RAI with the COMPAS RAI
In order to compare our model performance to the COMPAS RAI, we selected a cutoff that would classify the same proportion of cases as recidivating or violently recidivating. We then compared the confusion matrices yielded by our model and the COMPAS model, both classifying on the test set we used which contained 1234 individuals or about 20% of the 6172 individuals.
General Recidivism
The metrics table demonstrates that our RAI is superior to the COMPAS RAI across all relevant metrics: accuracy, false positive rate, false negative rate, PPV, and NPV.
Our RAI largely produces similar classifications to the COMPAS RAI (albeit with a small overall improvement): of all the observations in our test set, our RAI agreed with the COMPAS RAI for 72.2% of the observations.
Our RAI | Actual False | Actual True |
---|---|---|
Predicted False | 520 | 201 |
Predicted True | 163 | 350 |
COMPAS RAI | Actual False | Actual True |
---|---|---|
Predicted False | 490 | 218 |
Predicted True | 193 | 333 |
Metric | COMPAS RAI | Our RAI |
---|---|---|
Accuracy | 0.667 | 0.705 |
False Positive Rate | 0.283 | 0.239 |
False Negative Rate | 0.396 | 0.365 |
PPV | 0.633 | 0.682 |
NPV | 0.692 | 0.721 |
Cross-Predictions | Ours: False | Ours: True |
---|---|---|
COMPAS: False | 543 | 165 |
COMPAS: True | 178 | 348 |
Our RAI classified 165 observations as recidivating that COMPAS classified as not recidivating, while the COMPAS RAI classified 178 as recidivating that our RAI classified as not recidivating. Since these are very similar numbers, it is useful to examine whether there were systematic differences in the classifications that the RAIs did not agree on. The tables below reveal how accurate the classifications were when the two RAIs did not agree. In order to detect systematic differences, we examine three demographic categories to determine if one RAI is more prone to mistakes for a particular subgroup.
Ours: Race
0 | 1 | |
---|---|---|
African-American | 34 | 36 |
Asian | 0 | 0 |
Caucasian | 22 | 40 |
Hispanic | 10 | 8 |
Native American | 0 | 0 |
Other | 7 | 8 |
COMPAS: Race
0 | 1 | |
---|---|---|
African-American | 60 | 52 |
Asian | 0 | 0 |
Caucasian | 36 | 21 |
Hispanic | 6 | 2 |
Native American | 0 | 0 |
Other | 1 | 0 |
The set of race tables indicates that when COMPAS predicts an African-American to recidivate while our classifier does not, the COMPAS classifier is incorrect 53.57% of the time. On the other hand, when our classifier predicts an African-American to recidivate while COMPAS does not, our classifier is incorrect 48.57% of the time. This suggests evidence that the COMPAS RAI is more biased against African-Americans than our RAI; however, it should be noted that when the RAIs disagree, the COMPAS RAI also overpredicts the amount of Caucasians that will recidivate compared to our RAI.
Ours: Gender
0 | 1 | |
---|---|---|
Female | 10 | 10 |
Male | 63 | 82 |
COMPAS: Gender
0 | 1 | |
---|---|---|
Female | 27 | 15 |
Male | 76 | 60 |
In terms of gender, when our model predicts recidivism while COMPAS does not, 50% of females actually recidivate and 56.55% of males actually recidivate. When the COMPAS RAI predicts recidivism while our RAI does not, it has a much higher rate of false positives for both males and females.
Ours: Age
0 | 1 | |
---|---|---|
25 - 45 | 40 | 53 |
Greater than 45 | 14 | 14 |
Less than 25 | 19 | 25 |
COMPAS: Age
0 | 1 | |
---|---|---|
25 - 45 | 65 | 37 |
Greater than 45 | 18 | 10 |
Less than 25 | 20 | 28 |
According to the age tables, we see that there are similar false positive rates in both cases when an individual is under 25 (41.67 vs. 43.18); however, when COMPAS classifies an individual aged 25-45 as recidivating and our RAI does not, COMPAS is wrong 63.73% of the time, while in the opposite case, our classifier is only wrong 43.01% of the time.
The evidence suggests that our RAI is less systematically biased across at least one demographic (age) than the COMPAS RAI. We also see that the COMPAS RAI tends to have higher misclassification rates than our RAI.
Violent Recidivism
The metrics table demonstrates that our RAI is superior to the COMPAS RAI across all relevant metrics: accuracy, false positive rate, false negative rate, PPV, and NPV.
Our RAI largely produces similar classifications to the COMPAS RAI (albeit with a small overall improvement): of all the observations in our test set, our RAI agreed with the COMPAS RAI for 69.53% of the observations.
Our RAI | Actual False | Actual True |
---|---|---|
Predicted False | 773 | 60 |
Predicted True | 321 | 80 |
COMPAS RAI | Actual False | Actual True |
---|---|---|
Predicted False | 748 | 69 |
Predicted True | 346 | 71 |
Metric | COMPAS RAI | Our RAI |
---|---|---|
Accuracy | 0.664 | 0.691 |
False Positive Rate | 0.316 | 0.293 |
False Negative Rate | 0.493 | 0.429 |
PPV | 0.170 | 0.200 |
NPV | 0.916 | 0.928 |
Cross-Predictions | Ours: False | Ours: True |
---|---|---|
COMPAS: False | 637 | 180 |
COMPAS: True | 196 | 221 |
Our RAI classified 180 observations as recidivating that COMPAS classified as not recidivating, while the COMPAS RAI classified 196 as rediviating that our RAI classified as not recidivating: since these are very similar numbers, it is useful to examine whether there were systematic differences in the classifications that the RAIs did not agree on. The tables below reveal how accurate the classifications were when the two RAIs did not agree. In order to detect systematic differences, we examine three demographic categories to determine if one RAI is more prone to mistakes for a particular subgroup.
Ours: Race
0 | 1 | |
---|---|---|
African-American | 78 | 19 |
Asian | 1 | 0 |
Caucasian | 53 | 4 |
Hispanic | 10 | 3 |
Native American | 0 | 0 |
Other | 9 | 3 |
COMPAS: Race
0 | 1 | |
---|---|---|
African-American | 119 | 17 |
Asian | 0 | 0 |
Caucasian | 43 | 3 |
Hispanic | 9 | 0 |
Native American | 0 | 0 |
Other | 5 | 0 |
The set of race tables indicates that when COMPAS predicts an African-American to violently recidivate while our classifier does not, the COMPAS classifier is incorrect 87.5% of the time. Our classifier has a similar rate at 80.41% of the time. When our RAI classifies an individual as violently recidivating while COMPAS does not, our RAI has a higher false positive rate for Caucasians than the COMPAS RAI; however, in the reverse case, the COMPAS RAI has a higher false positive rate for Hispanics.
Ours: Gender
0 | 1 | |
---|---|---|
Female | 13 | 5 |
Male | 138 | 24 |
COMPAS: Gender
0 | 1 | |
---|---|---|
Female | 38 | 1 |
Male | 138 | 19 |
For gender, when our model classifies a Male as violently recidivating while COMPAS does not , it is wrong in 85.19% of cases; on the other hand, when COMPAS classifies a Male as violently recidivating while our RAI does not, it is wrong in 87.9% of cases: For females, when COMPAS classifies a female as violently recidivating and our RAI does not, there are 38 times more false positives than true positives: when our model classifies a female as violently recidivating and the COMPAS model does not, this number is 2.6. Our model appears to be less biased against females than the COMPAS model, illustrated by the larger difference in the false positive rate between genders in the COMPAS:Gender table compared to the OUR:Gender table.
Ours: Age
0 | 1 | |
---|---|---|
25 - 45 | 115 | 24 |
Greater than 45 | 23 | 3 |
Less than 25 | 13 | 2 |
COMPAS: Age
0 | 1 | |
---|---|---|
25 - 45 | 70 | 13 |
Greater than 45 | 12 | 3 |
Less than 25 | 94 | 4 |
We see from the age tables that for individuals 25-45, there are similar false positive rates in the both cases of disagreement between RAIs. For those above 45, when our model classifies an individual violently recidivating while COMPAS does not, we are wrong 88.46% of the time, whereas in the reverse case, COMPAS is wrong 80% of the time. For individuals below 25, in the case of disagreement where COMPAS classifies an individual as recidivating and our RAI does not, the false positive rate is 95.92% while in the reverse case, our model false positive rate is 86.67%. This illustrates that our model is more biased against older people than the COMPAS model, yet less biased against younger people than the COMPAS model. Nevertheless, these differences are extremely small.
The evidence suggests that both our RAI and the COMPAS RAI suffer from certain biases; yet the difference in the relevant false positive rates does not appear to be large enough to suggest that one RAI is less sytematically biased than the other.
5. Conclusion
Our task was to create an RAI to predict two-year recidivism and two-year violent recidivism in Broward County, Florida. In particular, our RAI needed to outperform the existing standard, COMPAS, to be implemented.
The raw data set contained 7214 observations and 56 columns. After cleaning and filtering the data, we ended up with a data set of 6172 observations and 24 columns to develop our classifier. Included in these columns were features engineered from the data. Among these features: the time an individual spent in jail, the time an individual spent in custody, and a brief description of the crime an individual had committed.
To begin the development of our process, we ran a lasso and a best subset selection in order to get a better idea of what variables will be important predictors of the two outcomes. After this, we fit and described four classification models: logistic regression, LDA, classification tree, and random forest. For each model, we calculated a range of performance metrics and graphics; all the models were either cross-validated or tested on a hold-out test set to ensure we do not overfit the data.
We ended up choosing the random forest model both to predict two-year recidivism and two-year violent recidivism. Our model for violent recidivism performs worse than our model for general recidivism; this is to be expected, however, considering the low prevalence of violent recidivism cases in the data. In both cases, the model slightly outperforms the corresponding COMPAS classifier. We also gathered evidence that suggests that the COMPAS classifier is more biased against particular groups than our classifier. Both these findings support the implementation of our RAI in Broward County.