How do trends in probation compare across four urban areas in Virginia: Richmond, Virginia Beach, Roanoke, and Charlottesville?
Is there a correlation between the relative socioeconomic status of a zip code and the prevalence of felony cases that involve probation?
Does a person’s race affect how the criminal justice system interacts with them, especially when juries are in charge of sentencing? Could race be a factor affecting a person’s chances of receiving probation? Does this differ across localities?
At the beginning of this project, we were expecting to see drastic disparities in the rates at which people receive probation — specifically, disparities across the variable of race. While the numbers do not line up perfectly, there is not the jarring difference that we were expecting to see. When comparing the charges in felony cases that had probation, we noticed that the largest number of cases actually had to do with probation violations. This might suggest that while getting probation is a good thing initially (it gets people out of jail and back into society, to an extent), the strict rules people on probation must follow put them at high risk for repeated interaction with the justice system or even returning to jail as a result of a violation. Because of this ambiguity around whether probation has a positive or negative impact on a person’s life, we wanted to explore how probation rates and counts differ among four cities and metropolitan areas in Virginia.
Beginning in July 2021, defendants will have the opportunity to choose, after conviction, whether their penalty comes from a jury or judge. + Historically jurors on a jury trials decide a defendant’s punishment + A defense attorney said that this will encourage defendants to not take plea deals and have more of a say in the result + Jurors can decide convictions well but sentencing is much harder (ex. Not aware of sentencing guidelines, also not given information about probation or alternatives to incarceration)
(https://www.whsv.com/2021/06/30/virginia-see-new-option-jury-sentencing-beginning-july-1/ )
We filtered Virginia circuit court case data for the year 2020 to just encapsulate zip codes within the four metropolitan areas anchored by Richmond, Charlottesville, Virginia Beach, and Roanoke. We filtered data using zip codes in the addresses of case data and attached zip codes to metro area fips codes.
In addition, we also obtained census data from the American Community Survey (ACS) to measure the population within each relevant zip code estimate the relative wealth and socioeconomic status of people living in a zip code. To estimate this we added the poverty rate and median household income for each zip code from ACS data to our Virginia court case data.We then used this data to complete our data investigation.
American Community Survey Data Median Household Income https://data.census.gov/cedsci/table?q=S1901&tid=ACSST5Y2020.S1901
Poverty Rate https://data.census.gov/cedsci/table?q=S1701
Population https://data.census.gov/cedsci/table?q=S0101
#Loading libraries and data, starting with the case data to explore sentences and probation
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.8
## ✓ tidyr 1.2.0 ✓ stringr 1.4.0
## ✓ readr 2.1.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Cases <- read.csv("../data/Cases.csv")
library(tidycensus)
library(tigris)
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
library(sf)
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(viridis)
## Loading required package: viridisLite
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(DT)
library(reactable)
library(reactablefmtr)
##
## Attaching package: 'reactablefmtr'
## The following object is masked from 'package:ggplot2':
##
## margin
options(tigris_use_cache = TRUE)
## all zip codes in the four metro areas, add wealth and population data
regionfips <- c("40060", "47260", "40220","16820") # richmond, va beach, roanoke, cville
regiongeo <- core_based_statistical_areas(cb = TRUE)
## Retrieving data for the year 2020
regiongeo <- regiongeo %>%
filter(GEOID %in% regionfips)
# ggplot(regiongeo) + geom_sf() # checking
zcta_va <- zctas(state = 51, year = 2010)
regionzcta <- st_intersection(zcta_va, regiongeo)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
# ggplot(regionzcta) + geom_sf() # checking
# use these zips in regionzcta to pull census ACS data
# Pulled Median Household Income, Poverty Rate, and Total Population
sesvar <- c(medhhinc = "S1901_C01_012",
povrate = "S1701_C03_001",
totalpopulation = "S0101_C01_001")
zip_ses <- get_acs(geography = "zcta",
year = 2019,
survey = "acs5",
variables = sesvar,
output = "wide")
## Getting data from the 2015-2019 5-year ACS
## Using the ACS Subject Tables
zip_ses <- zip_ses %>%
mutate(zcta = str_sub(NAME, -5, -1)) %>%
filter(zcta %in% regionzcta$ZCTA5CE10)
# you can use zip_ses or regionzcta to filter circuit court zip codes
# join to zip_ses to integrate ses data
# join to regionzcta to bring in locality identifier (e.g., is the zip code in Richmond metro, VA Beach metro, etc.)
#Extract zip codes from addresses
Cases <- Cases %>%
mutate(is_zip = ifelse(str_detect(Address, "\\d{5}"), 1, 0),
zip_code = ifelse(is_zip == 1, str_sub(Address, start = -5), NA_character_))
#merging data to be able to filter by metro area
names(Cases)[53] <- 'zcta'
merged_data <- merge(Cases, zip_ses, by="zcta")
names(regionzcta)[2] <- 'zcta'
merged_data <- merge(merged_data, regionzcta, by="zcta")
Sentence and Probation Indicators
Number of felony cases in a zip code and number of cases divided by population of zip code
Number of cases with a sentence in a zip code and the number of cases with a sentence divided by the population
Number of probation cases in a zip code and the number of probation cases divided by the population
Probation case rate by zip code (number of probation cases divided by total number of felony cases in a zip code)
#Probation Indicator
merged_data <- merged_data %>%
mutate(ProbationTime = replace_na(ProbationTime , 0)) %>%
mutate(probation_present = ifelse(ProbationTime == 0 , 0 , 1)) %>%
mutate (probation_presentqual = ifelse(probation_present == 0 , "No" , "Yes"))
#Sentence Indicator
merged_data <- merged_data %>%
mutate(SentenceTime = replace_na(SentenceTime , 0)) %>%
mutate(sentence_present = ifelse(SentenceTime == 0 , 0, 1)) %>%
mutate (sentence_presentqual = ifelse(sentence_present == 0 , "No" , "Yes"))
#Number of Felony Cases
merged_data <- merged_data %>%
group_by(zcta) %>%
filter(ChargeType == "Felony") %>%
mutate(totalcases = n())
#Felony Case Rate
merged_data <- merged_data %>%
mutate(totalcases_pop = totalcases / totalpopulationE)
#Number of cases with a sentence
Sentence_data <- merged_data %>%
group_by(zcta) %>%
filter(sentence_present == 1) %>%
mutate(totalsentcases = n())
Sentence_data <- Sentence_data %>%
mutate(sent_cases_pop = totalsentcases / totalpopulationE)
#Number of cases with probation
Probation_data <- merged_data %>%
group_by(zcta) %>%
filter(probation_present == 1) %>%
mutate(totalprobcases = n())
#Probation population rate
Probation_data <- Probation_data %>%
mutate(prob_cases_pop = totalprobcases / totalpopulationE)
#Probation Case Rate with one observation per zip code
Probationrates <- Probation_data %>%
group_by(zcta) %>%
summarize(numprobationcases = n() ,
cases = mean(totalcases) ,
povrate = mean(povrateE),
GEOID.y = first(GEOID.y)) %>%
mutate(probation_rate = numprobationcases / cases)
For the purposes of this analysis, we have coded race in two different ways. The first way preserves the numerous classifications that officers or court officials might select to describe a person involved in a case. The second way simply organizes people into ‘white’ and ‘non-white.’ Choosing to code this variable multiple ways was a conscious choice. In our class, we talked a lot about the role that perception might play in interactions with officers. Since race in this dataset is not self-reported, it is subject to the perception or the bias of the police officer or court official who reports it. By recoding this variable into two larger buckets, we hope to capture the perception element.
In our analysis, we did look at the more specific buckets of race before re-coding the variable. This more specific analysis can be seen in Richmond. In all of our chosen cities/metro areas, there were not significant numbers of people represented in the dataset who were not coded as ‘black’ or ‘white.’ As such, we eliminated this more specific comparison for the other cities, and only included the re-coded variable.
merged_data <- merged_data %>%
mutate(race_organized = fct_collapse(Race,
white = c("White Caucasian (Non-Hispanic)" , "White") ,
black = c("Black (Non-Hispanic)" , "Black") ,
asian = c("Asian Or Pacific Islander") ,
native_american = c("American Indian") ,
latinx = c("Hispanic") ,
other_unknown = c("Other (Includes Not Applicable, Unknown)" , "Unknown" , "")))
merged_data <- merged_data %>%
mutate(race_condensed = fct_collapse(Race,
white = c("White Caucasian (Non-Hispanic)" , "White") ,
non_white = c("Black (Non-Hispanic)" , "Black", "Asian Or Pacific Islander", "American Indian", "Hispanic") ,
other_unknown = c("Other (Includes Not Applicable, Unknown)" , "Unknown" , "")))
The 2021 Census reported Richmond as having a population that is 46% Black or African-American and 45% white. Much like Charlottesville and other cities across America, Richmond has a tumultuous history marked by redlining, which has led to acute disparities in neighborhoods across the city. One way this manifests is through health outcomes — people in neighborhoods that were redlined are more likely to suffer from chronic health conditions and also have shorter life expectancy than those in neighborhoods that were not redlined (Godoy, 2020). Poverty rates are higher, and even the average temperature of these neighborhoods are higher in the summer due to a lack of old growth trees (Plumer & Popovich, 2020).
There are lots of organizations doing work on equity issues in Richmond. Two we found in our research are linked here: Richmond Transparency and Accountability Project and Make Better Deeds
https://www.facingwhiteness.incite.columbia.edu/richmond-explore-2
Godoy, M. (2020, November 19). In U.S. Cities, The Health Effects Of Past Housing Discrimination Are Plain To See. NPR. https://www.npr.org/sections/health-shots/2020/11/19/911909187/in-u-s-cities-the-health-effects-of-past-housing-discrimination-are-plain-to-see#:~:text=hence%2C%20%22redlining.%22&text=Digital%20Scholarship%20Lab-
Plumer, B., & Popovich, N. (2020, August 24). How Decades of Racist Housing Policy Left Neighborhoods Sweltering. The New York Times. https://www.nytimes.com/interactive/2020/08/24/climate/racism-redlining-cities-global-warming.html
Cases with a sentence and cases with a probation
merged_data %>%
filter(GEOID.y == 40060) %>%
filter(sentence_presentqual == "Yes") %>%
summarize(numscases = n() ,
meanslength = mean(SentenceTime) ,
medianslength = median(SentenceTime) ,
maxslength = max(SentenceTime) ,
minslength = min(SentenceTime))
## # A tibble: 120 × 6
## zcta numscases meanslength medianslength maxslength minslength
## <chr> <int> <dbl> <dbl> <int> <int>
## 1 22437 1 180 180 180 180
## 2 22454 6 2991. 3650 3650 1520
## 3 22514 10 1633 1732. 3650 365
## 4 22546 100 1636. 1428. 9125 150
## 5 22560 29 2848. 1825 14600 90
## 6 23002 39 1816. 1825 7300 365
## 7 23005 80 1675. 1825 3650 30
## 8 23009 28 1176. 1095 3650 30
## 9 23011 9 1228. 1095 3285 180
## 10 23015 17 1483. 1825 3650 30
## # … with 110 more rows
merged_data %>%
filter(GEOID.y == 40060) %>%
filter(probation_presentqual == "Yes") %>%
summarize(numpcases = n() ,
meanplength = mean(ProbationTime) ,
medianplength = median(ProbationTime) ,
maxplegnth = max(ProbationTime) ,
minplength = min(ProbationTime))
## # A tibble: 112 × 6
## zcta numpcases meanplength medianplength maxplegnth minplength
## <chr> <int> <dbl> <dbl> <int> <int>
## 1 22454 3 365 365 365 365
## 2 22514 6 3224. 2738. 7300 1095
## 3 22546 51 2784. 1825 7300 365
## 4 22560 20 3595. 365 36135 365
## 5 23002 10 5292. 1825 36135 365
## 6 23005 20 2354. 1825 7300 365
## 7 23009 6 2798. 3650 3650 1095
## 8 23011 5 2117 1825 3650 365
## 9 23015 4 2464. 2738. 3650 730
## 10 23024 20 1807. 1095 7300 365
## # … with 102 more rows
Associated charges with felonies that included a probation
Richmondtable <- Probation_data %>%
filter(GEOID.y == "40060") %>%
group_by(Charge , zcta) %>%
summarize(total_by_charge = n())
## `summarise()` has grouped output by 'Charge'. You can override using the
## `.groups` argument.
Richmondtable <- Richmondtable %>%
select(zcta, Charge, total_by_charge)
datatable(Richmondtable, caption = "Charge Types in Richmond Probations")
(Metro area compared to Independent City)
merged_data %>%
filter(GEOID.y == "40060") %>%
filter(sentence_present == 1) %>%
filter(ChargeType %in% c("Felony")) %>%
group_by(probation_presentqual) %>%
ggplot(aes(x= SentenceTime , fill = probation_presentqual)) +
geom_histogram() +
coord_cartesian(xlim = c(0,5000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
merged_data %>%
filter(fips == "760") %>%
filter(sentence_present == 1) %>%
filter(ChargeType %in% c("Felony")) %>%
group_by(probation_presentqual) %>%
ggplot(aes(x= SentenceTime , fill = probation_presentqual)) +
geom_histogram() +
coord_cartesian(xlim = c(0,5000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Probation numbers in the city of Richmond are extremely peculiar. When glancing at rates of probation in the metro area, they do not seem too different from rates in the other areas. However, almost no probation is given out for cases in the city itself. A quick search did not reveal any differences in rules between the City of Richmond and other places, so there was not a clear reason for why the city’s rates of probation are so low. This might be something for a future group to look into — does this trend continue across years? Are there any other patterns we missed?
(Relative measures of socioeconomic status of community include median household income and poverty rate)
The following scatter plots show the relationship between variables of interest and the poverty rate or median houshold income in a given zip code. Each point represents one zip code.
Probation_data %>%
group_by(zcta) %>%
filter(GEOID.y == "40060") %>%
ggplot(aes(x= povrateE , y = totalcases_pop )) +
ggtitle("Richmond, poverty rate v per capita number of felony cases") +
geom_point()
## Warning: Removed 4 rows containing missing values (geom_point).
Probation_data %>%
group_by(zcta) %>%
filter(GEOID.y == "40060") %>%
ggplot(aes(x= medhhincE , y = prob_cases_pop )) +
ggtitle("Richmond, med income v per capita probation cases") +
geom_point()
## Warning: Removed 10 rows containing missing values (geom_point).
Probation_data %>%
group_by(zcta) %>%
filter(GEOID.y == "40060") %>%
ggplot(aes(x= povrateE , y = prob_cases_pop )) +
ggtitle("Richmond, pov rate v per capita probation cases") +
geom_point()
## Warning: Removed 4 rows containing missing values (geom_point).
Probationrates %>%
group_by(zcta) %>%
filter(GEOID.y == "40060") %>%
ggplot(aes(x= povrate , y = probation_rate )) +
ggtitle("Richmond, pov rate v probation_rate") +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).
As seen in the scatterplots, there appears to be something of a relationship between poverty rate and rates of sentencing and probation. However, the relationship is fairly weak. One place where we see a positive relationship is when comparing the poverty rate to the rate at which cases with sentences occur. As the poverty rate rises, so does the sentencing rate (although there are zip codes that break the trend).
There is a small but clear negative correlation between the poverty rate and the probation rate of a zip code.
It is important to note that the poverty rate data is pulled from the American Community Survey (ACS) via the Census API. The poverty rate is the average pulled from each zip code. For the graphs showing median household income, that is the median for the zip code and is not directly connected to any individuals represented in the cases in that zip code.
Municipalities in Richmond Metro area divided by zip code Probation rate (#number of probation cases / # of felony cases) Poverty rate (from ACS data)
va <- counties(state = "51" , cb= TRUE)
## Retrieving data for the year 2020
Richmond <- va %>%
filter(COUNTYFP %in% c("760" , "730" , "670" , "570" , "007" , "036" , "041" , "053" , "075" , "085" , "087" , "101" , "127" , "145" , "149" , "183"))
Richmondzipcountymap <- st_intersection(Richmond, zcta_va)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
names(Richmondzipcountymap)[14] <- 'zcta'
RichmondMapData <- merge(Probationrates, Richmondzipcountymap, by="zcta")
ggplot(data = RichmondMapData) +
geom_sf(aes(fill = probation_rate , geometry = geometry)) +
scale_fill_viridis_c(option = "inferno" , direction = -1)
ggplot(data = RichmondMapData) +
geom_sf(aes(fill = povrate , geometry = geometry)) +
scale_fill_viridis_c(option = "inferno" , direction = -1)
The negative correlation between probation rates and poverty rates is also shown on this map.
Is race correlated with the likelihood of a person receiving probation or not?
merged_data %>%
filter(sentence_present == 1) %>%
filter(GEOID.y == "40060") %>%
filter(ChargeType == "Felony") %>%
group_by(probation_presentqual) %>%
ggplot(aes(x= SentenceTime , fill = probation_presentqual)) +
geom_histogram() +
ggtitle("Richmond, race_organized")+
coord_cartesian(xlim = c(0,8000)) +
facet_wrap(~race_organized)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
merged_data %>%
filter(sentence_present == 1) %>%
filter(GEOID.y == 40060) %>%
filter(ChargeType == "Felony") %>%
group_by(probation_presentqual) %>%
ggplot(aes(x= SentenceTime , fill = probation_presentqual)) +
geom_histogram() +
ggtitle("Richmond, race_condensed")+
coord_cartesian(xlim = c(0,8000)) +
facet_wrap(~race_condensed) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Side by Side Comparisons of Virginia Beach, Roanoke, and Charlottesville metropolitan Areas
Before looking through the information below we suggest taking a look at the following two links to learn more about Charlottesville and its troubling racial history.
(https://jeffschoolheritagecenter.org/collections/mapping-cville/ )
(http://www2.iath.virginia.edu/schwartz/vhill/vhill.history.html )
ThreeMetro <- merged_data %>%
filter(GEOID.y == "47260" | GEOID.y == "40220" | GEOID.y == "16820")
ThreeMetro %>%
group_by(GEOID.y) %>%
filter(sentence_presentqual == "Yes") %>%
summarize(numscases = n() ,
meanslength = mean(SentenceTime) ,
medianslength = median(SentenceTime) ,
maxslength = max(SentenceTime) ,
minslength = min(SentenceTime))
## # A tibble: 3 × 6
## GEOID.y numscases meanslength medianslength maxslength minslength
## <chr> <int> <dbl> <dbl> <int> <int>
## 1 16820 1183 1706. 1460 20075 10
## 2 40220 3417 1137. 730 21900 1
## 3 47260 5440 1552. 1215 36135 2
ThreeMetro %>%
group_by(GEOID.y) %>%
filter(probation_presentqual == "Yes") %>%
summarize(numpcases = n() ,
meanplength = mean(ProbationTime) ,
medianplength = median(ProbationTime) ,
maxplegnth = max(ProbationTime) ,
minplength = min(ProbationTime))
## # A tibble: 3 × 6
## GEOID.y numpcases meanplength medianplength maxplegnth minplength
## <chr> <int> <dbl> <dbl> <int> <int>
## 1 16820 834 1431. 730 36135 180
## 2 40220 2725 1279. 730 36135 180
## 3 47260 3226 1832. 1095 36135 60
Investigating what charges tend to be connected to felonies with a probation
VirginiaBeachTable <- ThreeMetro %>%
filter(GEOID.y == "47260") %>%
group_by(Charge , zcta) %>%
summarize(total_by_charge = n())
## `summarise()` has grouped output by 'Charge'. You can override using the
## `.groups` argument.
VirginiaBeachTable <- VirginiaBeachTable%>%
select(zcta, Charge, total_by_charge)
datatable(VirginiaBeachTable, caption = "Charge Types in Virginia Beach Probations")
RoanokeTable <- ThreeMetro %>%
filter(GEOID.y == "40220") %>%
group_by(Charge , zcta) %>%
summarize(total_by_charge = n())
## `summarise()` has grouped output by 'Charge'. You can override using the
## `.groups` argument.
RoanokeTable <- RoanokeTable%>%
select(zcta, Charge, total_by_charge)
datatable(RoanokeTable, caption = "Charge Types in Roanoke Probations")
CharlottesvilleTable <- ThreeMetro %>%
filter(GEOID.y == "16820") %>%
group_by(Charge , zcta) %>%
summarize(total_by_charge = n())
## `summarise()` has grouped output by 'Charge'. You can override using the
## `.groups` argument.
CharrlottesvilleTable <- CharlottesvilleTable%>%
select(zcta, Charge, total_by_charge)
datatable(CharlottesvilleTable, caption = "Charge Types in Charlottesville Probations")
(Metro area compared to Independent City)
merged_data %>%
filter(sentence_present == 1) %>%
filter(GEOID.y == 47260) %>%
filter(ChargeType == "Felony") %>%
group_by(probation_presentqual) %>%
ggplot(aes(x= SentenceTime , fill = probation_presentqual)) +
geom_histogram() +
coord_cartesian(xlim = c(0,5000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
merged_data %>%
filter(fips == "810") %>%
filter(sentence_present == 1) %>%
filter(ChargeType %in% c("Felony")) %>%
group_by(probation_presentqual) %>%
ggplot(aes(x= SentenceTime , fill = probation_presentqual)) +
geom_histogram() +
coord_cartesian(xlim = c(0,5000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
merged_data %>%
filter(GEOID.y == "40220") %>%
filter(sentence_present == 1) %>%
filter(ChargeType %in% c("Felony")) %>%
group_by(probation_presentqual) %>%
ggplot(aes(x= SentenceTime , fill = probation_presentqual)) +
geom_histogram() +
coord_cartesian(xlim = c(0,5000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
merged_data %>%
filter(fips == "770") %>%
filter(sentence_present == 1) %>%
filter(ChargeType %in% c("Felony")) %>%
group_by(probation_presentqual) %>%
ggplot(aes(x= SentenceTime , fill = probation_presentqual)) +
geom_histogram() +
coord_cartesian(xlim = c(0,5000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
merged_data %>%
filter(sentence_present == 1) %>%
filter(GEOID.y == "16820") %>%
filter(ChargeType == "Felony") %>%
group_by(probation_presentqual) %>%
ggplot(aes(x= SentenceTime , fill = probation_presentqual)) +
geom_histogram() +
coord_cartesian(xlim = c(0,8000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
merged_data %>%
filter(fips == "540") %>%
filter(sentence_present == 1) %>%
filter(ChargeType %in% c("Felony")) %>%
group_by(probation_presentqual) %>%
ggplot(aes(x= SentenceTime , fill = probation_presentqual)) +
geom_histogram() +
coord_cartesian(xlim = c(0,8000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Relative measures of socioeconomic status of community include median household income and poverty rate by zip code
Probation_data %>%
group_by(zcta) %>%
filter(GEOID.y == "47260") %>%
ggplot(aes(x= povrateE , y = totalcases_pop )) +
ggtitle("Virginia Beach, poverty rate v per capita number of felony cases") +
geom_point()
## Warning: Removed 1 rows containing missing values (geom_point).
Probation_data %>%
group_by(zcta) %>%
filter(GEOID.y == "47260") %>%
ggplot(aes(x= medhhincE , y = prob_cases_pop )) +
ggtitle("Virginia Beach, med income v per capita probation cases") +
geom_point()
## Warning: Removed 7 rows containing missing values (geom_point).
Probation_data %>%
group_by(zcta) %>%
filter(GEOID.y == "47260") %>%
ggplot(aes(x= povrateE , y = prob_cases_pop )) +
ggtitle("Virginia Beach, pov rate v per capita probation cases") +
geom_point()
## Warning: Removed 1 rows containing missing values (geom_point).
Probationrates %>%
group_by(zcta) %>%
filter(GEOID.y == "47260") %>%
ggplot(aes(x= povrate , y = probation_rate )) +
ggtitle("Virginia Beach, pov rate v probation_rate") +
geom_point()
## Warning: Removed 1 rows containing missing values (geom_point).
Probation_data %>%
group_by(zcta) %>%
filter(GEOID.y == "40220") %>%
ggplot(aes(x= povrateE , y = totalcases_pop )) +
ggtitle("Roanoke, poverty rate v per capita number of felony cases") +
geom_point()
Probation_data %>%
group_by(zcta) %>%
filter(GEOID.y == "40220") %>%
ggplot(aes(x= medhhincE , y = prob_cases_pop )) +
ggtitle("Roanoke, med income v per capita probation cases") +
geom_point()
Probation_data %>%
group_by(zcta) %>%
filter(GEOID.y == "40220") %>%
ggplot(aes(x= povrateE , y = prob_cases_pop )) +
ggtitle("Roanoke, pov rate v per capita probation cases") +
geom_point()
Probationrates %>%
group_by(zcta) %>%
filter(GEOID.y == "40220") %>%
ggplot(aes(x= povrate , y = probation_rate )) +
ggtitle("Roanoke, pov rate v probation_rate") +
geom_point()
Probation_data %>%
group_by(zcta) %>%
filter(GEOID.y == "16820") %>%
ggplot(aes(x= povrateE , y = totalcases_pop )) +
ggtitle("Charlottesville, poverty rate v per capita number of felony cases") +
geom_point()
Probation_data %>%
group_by(zcta) %>%
filter(GEOID.y == "16820") %>%
ggplot(aes(x= medhhincE , y = prob_cases_pop )) +
ggtitle("Charlottesville, med income v per capita probation cases") +
geom_point()
## Warning: Removed 9 rows containing missing values (geom_point).
Probation_data %>%
group_by(zcta) %>%
filter(GEOID.y == "16820") %>%
ggplot(aes(x= povrateE , y = prob_cases_pop )) +
ggtitle("Charlottesville, pov rate v per capita probation cases") +
geom_point()
Probationrates %>%
group_by(zcta) %>%
filter(GEOID.y == "16820") %>%
ggplot(aes(x= povrate , y = probation_rate )) +
ggtitle("Charlottesville, pov rate v probation_rate") +
geom_point()
va <- counties(state = "51" , cb= TRUE)
## Retrieving data for the year 2020
VirginiaBeach <- va %>%
filter(COUNTYFP %in% c("550" , "620" , "650" , "700" , "710" , "735" , "740" , "800" , "810" , "830" , "073" , "093" , "095" , "115" , "175" , "199"))
VirginiaBeachzipcountymap <- st_intersection(VirginiaBeach, zcta_va)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
names(VirginiaBeachzipcountymap)[14] <- 'zcta'
VirginiaBeachMapData <- merge(Probationrates, VirginiaBeachzipcountymap, by="zcta")
ggplot(data = VirginiaBeachMapData) +
geom_sf(aes(fill = probation_rate , geometry = geometry)) +
scale_fill_viridis_c(option = "inferno" , direction = -1)
ggplot(data = VirginiaBeachMapData) +
geom_sf(aes(fill = povrate , geometry = geometry)) +
scale_fill_viridis_c(option = "inferno" , direction = -1)
va <- counties(state = "51" , cb= TRUE)
## Retrieving data for the year 2020
Roanoke <- va %>%
filter(COUNTYFP %in% c("770" , "775" , "023" , "045" , "067" , "161"))
Roanokezipcountymap <- st_intersection(Roanoke, zcta_va)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
names(Roanokezipcountymap)[14] <- 'zcta'
RoanokeMapData <- merge(Probationrates, Roanokezipcountymap, by="zcta")
ggplot(data = RoanokeMapData) +
geom_sf(aes(fill = probation_rate , geometry = geometry)) +
scale_fill_viridis_c(option = "inferno" , direction = -1)
ggplot(data = RoanokeMapData) +
geom_sf(aes(fill = povrate , geometry = geometry)) +
scale_fill_viridis_c(option = "inferno" , direction = -1)
va <- counties(state = "51" , cb= TRUE)
## Retrieving data for the year 2020
Charlottesville <- va %>%
filter(COUNTYFP %in% c("540" , "125" , "003" , "029" , "065" , "079" , "109"))
Charlottesvillezipcountymap <- st_intersection(Charlottesville, zcta_va)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
names(Charlottesvillezipcountymap)[14] <- 'zcta'
CharlottesvilleMapData <- merge(Probationrates, Charlottesvillezipcountymap, by="zcta")
ggplot(data = CharlottesvilleMapData) +
geom_sf(aes(fill = probation_rate , geometry = geometry)) +
scale_fill_viridis_c(option = "inferno" , direction = -1)
ggplot(data = CharlottesvilleMapData) +
geom_sf(aes(fill = povrate , geometry = geometry)) +
scale_fill_viridis_c(option = "inferno" , direction = -1)
Is race correlated with the likelihood of a person receiving probation or not?
merged_data %>%
filter(sentence_present == 1) %>%
filter(GEOID.y == "47260") %>%
filter(ChargeType == "Felony") %>%
group_by(probation_presentqual) %>%
ggplot(aes(x= SentenceTime , fill = probation_presentqual)) +
geom_histogram() +
ggtitle("Virginia Beach, race_condensed")+
coord_cartesian(xlim = c(0,8000)) +
facet_wrap(~race_condensed) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
merged_data %>%
filter(sentence_present == 1) %>%
filter(GEOID.y == "40220") %>%
filter(ChargeType == "Felony") %>%
group_by(probation_presentqual) %>%
ggplot(aes(x= SentenceTime , fill = probation_presentqual)) +
geom_histogram() +
ggtitle("Roanoke, race_condensed")+
coord_cartesian(xlim = c(0,5000)) +
facet_wrap(~race_condensed) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
merged_data %>%
filter(sentence_present == 1) %>%
filter(GEOID.y == "16820") %>%
filter(ChargeType == "Felony") %>%
group_by(probation_presentqual) %>%
ggplot(aes(x= SentenceTime , fill = probation_presentqual)) +
geom_histogram() +
ggtitle("Charlottesville, race_condensed")+
coord_cartesian(xlim = c(0,5000)) +
facet_wrap(~race_condensed) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
While we’ve made substantial progress on this topic throughout the semester, there are a few things that we were unable to incorporate. The following is a short list of topics we’d recommend that any students interested in pursuing the same topic focus on:
Incorporate data from years other than 2020 into this analysis (could still run the same analysis, but with more data/years)
Specifically analyze the potential impacts of the Virginia law change surrounding judge sentencing. Data from the years following the implementation of this rule may reveal interesting insights about the impacts of having more judges delivering sentences as opposed to getting them from juries.
Dig more into the hearing data connected with these cases. We did some preliminary research on hearings, but ultimately decided to go another direction. In the future, another group could focus more on learning about the differences between types of hearings, the implications associated with each, and how this might impact probation and sentencing.