Table of Contents
Increasingly, organizations are turning to data science to extract value from otherwise dormant data. When you think of organizations applying ‘data science’, your mind may jump to Amazon, Facebook, or Netflix, but other organizations possess the same potential to transform their operations with rigorous data analysis. While searching for an internship, I focused on finding an organization that could not only draw benefit from my skill set, but moreover, I focused on finding an organization tackling weighty public or civic problems. Ideally an internship would satisfy both conditions… enter the American Civil Liberties Union (hereafter abbreviated ACLU).
A Brief History of the ACLU
Founded in 1920, the ACLU is a non-partisan, non-profit organization with a mission to “defend and preserve the individual rights and liberties guaranteed to all people in this country by the Constitution and laws of the United States.” Simply stated, the organization supports and defends our Constitutional rights.
As a principled and non-partisan organization operating in an era of heightened political polarization, the ACLU is not immune to criticism from any political ideology. Conservative groups chastise the ACLU for often declaring liberal stances in cases framed as ‘religious freedom cases’ such as Masterpiece Cakeshop (2018) or Obergefell (2015). Just as recent, in August 2017, the ACLU, while adhering to its mandate to preserve First Amendment freedoms, invited castigation from liberal groups for defending white nationalists’ right to organize a protest (known as the Unite the Right Rally) at Emanicipation Park in Charlottesville, Virginia.
Founders of this historic organization include Roger Baldwin, Helen Keller, and Arthur Garfield Hays (no relation). Currently, the ACLU supports its national efforts plus the individual efforts of its 54 affiliates (3 for California, 1 for each other state, Washington D.C., and Puerto Rico). You can read more about the ACLU’s history here and find your local affiliate here. For an unaffiliated/unsponsored Drunk History-style retelling of the ACLU’s history, please watch this unofficial video produced by and starring interns.
For many non-profit organizations, data science is viewed as a luxury or a ‘nice-to-have’ but also something seldom worth investing in ‘right now’. Fortunately for me, in early 2018, the ACLU formed a Data & Analytics department and listed summer internship opportunities in early April. The potential to work on interesting projects for a historic institution, to apply and refine my quantitative and technical skills, and to learn more about data science/analytics was too much to resist!
Bookending the Summer: Uncovering Themes
Prior to reviewing the highlights and details (at least what can be shared) of working for the ACLU Data Science team and living in New York City, two books established themselves as more than train-ride summer reading, but as themes for the summer. The two books were Just Mercy by Bryan Stevenson and The Victory Lab by Sasha Issenberg.
Just Mercy focuses on the racial injustice endured by African Americans - particularly those living in the deep south. The non-fictional account follows Bryan Stevenson, a lawyer and the founder of the Equal Justice Initiative, as he defends a handful of individuals on death row (some of whom are teens) and faces ever-increasing challenges due to his clients’ race. The book dismantles the notion that racial injustice is a solved issue as of 2018 and serves as a reminder of the work still required to reform our criminal justice system.
The Victory Lab discusses the data revolution transforming our American political system (as well as the billion dollar industry driving the transformation). The author explores the field of political science, the initial resistance to adopt new practices, and the subsequent data-arms race. Additionally, Issenberg explains how shifting from precinct-level approaches to granular/individual-centered approaches has affected voter outreach. As the ACLU invests more effort into advocacy, especially leading into the 2018 midterm elections, the relevance of scientific experimentation and data analysis, as discussed by Issenberg, cannot be understated.
The ACLU is located in lower Manhattan in the Financial District. Investment banks, financial services companies, and insurance conglomerates occupy the neighboring buildings. If you miss the Fulton or Wall Street stops on the southbound 4 / 5 train, then you’ll likely end up near the national ACLU headquarters. The office features prominent ACLU branding, plenty of conference rooms with whiteboards, 30 Years of Doonesbury on Trump, and sometimes Edward Snowden (via the Snowdenbot as seen here).
As mentioned above, the ACLU Analytics team formed in early 2018. The nascent nature of the Analytics team does not inhibit the quality or quantity of the work produced but it does mean the underlying data infrastructure is not fully formed (yet). A recurring task (especially during the beginning of the summer) was moving data from a handful of OLTP databases or raw data sources to an Amazon Redshift data warehouse. At times, working with the data in this fashion was slow, messy, and dirty - reminding me of the ongoing construction to the Bedford Ave station in Brooklyn. Aiming to handle heavy volumes, similar to the cherished Grand Central Terminal on 42nd Street and Park Avenue, both the new Bedford Ave station and the new ACLU data infrastructure are investments in the future.
For an organization that relies heavily on donors to sustain its efforts defending civil liberties, projecting future donations is critical to guiding the organization’s strategy and operations. In order to project several years into the future, advanced time series models were constructed using the ARIMA (auto-regressive integrated moving average) technique in the popular statistical computing language R. Projections were bolstered by categorizing donations based on amount, type, and frequency - and also by understanding how the composition of donations changes over time. Naturally, time-based features were included as regressors - national events were considered in the time-series models (e.g., 2016 Presidential Election, 2017 Presidential Inauguration, 2017 Charlottesville Protests, etc.).
Understanding the financials of an organization is important but so is understanding its constituents/members. Using non-commercial data from a popular national voter file, the ACLU was able to learn more about its membership and its membership’s aggregate voting history. Working with over 250M rows of data was a challenge, but any analysis that could be offloaded from R or Python to SQL in the lightning-fast Amazon Redshift cluster was offloaded to save time.
Spreading Data Literacy
Part of the responsibilties of the Analytics team is spreading data literacy throughout the organization. As part of the summer, I held a Data Science 101 session to introduce those unfamiliar with the topic to the core concepts. Some of the areas covered included the data science process, the common tools used, and popular terminology.
Skills & Tools
- Python, R (hurray tidyverse!)
- SQL, Amazon Redshift
- Time-series analysis/forecasting (ARIMA)
Primarily for the benefit of the interns, the ACLU arranged dozens of events that enriched the experience, and for a non-legal intern, helped me connect my work to the organization’s traditional legal work. Examples of the events include lunch with the Executive Director and also court case debriefs where ACLU legal staff discuss outcomes and legal ramifications from recent decisions or news. There was also a lunch-time musical performance!
Spending a summer in New York would not be complete without attending Macy’s New York City 4th of July Fireworks Spectacular. The gatherings of people, firework displays, and other holiday festivities echo the importance of the work performed by the ACLU - whether that’s reducing discrimination, defending our right to protest/right to due process, or protecting vulnerable groups such as immigrants. The ACLU provided an engaging opportunity to learn how new data science and analytics methods are changing a traditional and historic organization - one not typically associated with data science.