The goal of this project is to get a idea of the nature of traffic accidents in Texas. To help with the goal, an analysis of weekdays vs. weekends will be done to give a better pictures of how accidents occur. Ontop of this, a interactive map will be created to better show the accident pattern in Texas
Our dataset is a compilation of countrywide traffic accidents in 49 US states. The data ranges from Februrary 2016 to December 2019 and has about 3 million records. To make the data more managable, we decided to only look at records from Texas, which was still 300000 records.
There are 49 columns included in the dataset.
knitr::opts_chunk$set(warning=FALSE, message=FALSE)
####### ALL LIBRARIES USED ###########
#declare libraries
options(stringsAsFactors = FALSE)
library(tidyr)
library(dplyr)
library(highcharter)
library(lubridate)
library(padr)
library(tm)
library(DT)
library(knitr)
library(kableExtra)
library(maps)
library(leaflet)
library(htmltools)
library(geojsonio)
#reading in clean data
raw <- read.csv("cleaned-accidents.csv")
info=c("ID", "This is a unique identifier of the accident record.",
"Source", "Indicates source of the accident report (i.e. the API which reported the accident.).",
"TMC", "A traffic accident may have a Traffic Message Channel (TMC) code which provides more detailed description of the event.",
"Severity", "Shows the severity of the accident, a number between 1 and 4, where 1 indicates the least impact on traffic (i.e., short delay as a result of the accident) and 4 indicates a significant impact on traffic (i.e., long delay).",
"Start_Time", "Shows start time of the accident in local time zone.",
"End_Time", "Shows end time of the accident in local time zone.",
"Start_Lat", "Shows latitude in GPS coordinate of the start point.",
"Start_Lng", "Shows longitude in GPS coordinate of the start point.",
"End_Lat", "Shows latitude in GPS coordinate of the end point.",
"End_Lng", "Shows longitude in GPS coordinate of the end point.",
"Distance(mi)", "The length of the road extent affected by the accident.",
"Description", "Shows natural language description of the accident.",
"Number", "Shows the street number in address field.",
"Street", "Shows the street name in address field.",
"Side", "Shows the relative side of the street (Right/Left) in address field.",
"City", "Shows the city in address field.",
"County", "Shows the county in address field.",
"State", "Shows the state in address field.",
"Zipcode", "Shows the zipcode in address field.",
"Country", "Shows the country in address field.",
"Timezone", "Shows timezone based on the location of the accident (eastern, central, etc.).",
"Airport_Code", "Denotes an airport-based weather station which is the closest one to location of the accident.",
"Weather_Timestamp", "Shows the time-stamp of weather observation record (in local time).",
"Temperature(F)", "Shows the temperature (in Fahrenheit).",
"Wind_Chill(F)", "Shows the wind chill (in Fahrenheit).",
"Humidity(%)", "Shows the humidity (in percentage).",
"Pressure(in)", "Shows the air pressure (in inches).",
"Visibility(mi)", "Shows visibility (in miles).",
"Wind_Direction", "Shows wind direction.",
"Wind_Speed(mph)", "Shows wind speed (in miles per hour).",
"Precipitation(in)", "Shows precipitation amount in inches, if there is any.",
"Weather_Condition", "Shows the weather condition (rain, snow, thunderstorm, fog, etc.).",
"Amenity", "A Point-Of-Interest (POI) annotation which indicates presence of amenity in a nearby location.",
"Bump", "A POI annotation which indicates presence of speed bump or hump in a nearby location.",
"Crossing", "A POI annotation which indicates presence of crossing in a nearby location.",
"Give_Way", "A POI annotation which indicates presence of give_way sign in a nearby location.",
"Junction", "A POI annotation which indicates presence of junction in a nearby location.",
"No_Exit", "A POI annotation which indicates presence of no_exit sign in a nearby location.",
"Railway", "A POI annotation which indicates presence of railway in a nearby location.",
"Roundabout", "A POI annotation which indicates presence of roundabout in a nearby location.)",
"Station", "A POI annotation which indicates presence of station (bus, train, etc.) in a nearby location.",
"Stop", "A POI annotation which indicates presence of stop sign in a nearby location.",
"Traffic_Calming", "A POI annotation which indicates presence of traffic_calming means in a nearby location.",
"Traffic_Signal", "A POI annotation which indicates presence of traffic_signal in a nearby location.",
"Turning_Loop", "A POI annotation which indicates presence of turning_loop in a nearby location.",
"Sunrise_Sunset", "Shows the period of day (i.e. day or night) based on sunrise/sunset.",
"Civil_Twilight", "Shows the period of day (i.e. day or night) based on civil twilight.",
"Nautical_Twilight", "Shows the period of day (i.e. day or night) based on nautical twilight.",
"Astronomical_Twilight", "Shows the period of day (i.e. day or night) based on astronomical twilight.)")
infodf <- data.frame(ColumnName=info[seq(from=1,to=97, by=2)], Description=info[seq(from=2, to=98, by=2)], stringsAsFactors = FALSE)
infodf %>% kable() %>% kable_styling(bootstrap_options = "striped")
ColumnName | Description |
---|---|
ID | This is a unique identifier of the accident record. |
Source | Indicates source of the accident report (i.e. the API which reported the accident.). |
TMC | A traffic accident may have a Traffic Message Channel (TMC) code which provides more detailed description of the event. |
Severity | Shows the severity of the accident, a number between 1 and 4, where 1 indicates the least impact on traffic (i.e., short delay as a result of the accident) and 4 indicates a significant impact on traffic (i.e., long delay). |
Start_Time | Shows start time of the accident in local time zone. |
End_Time | Shows end time of the accident in local time zone. |
Start_Lat | Shows latitude in GPS coordinate of the start point. |
Start_Lng | Shows longitude in GPS coordinate of the start point. |
End_Lat | Shows latitude in GPS coordinate of the end point. |
End_Lng | Shows longitude in GPS coordinate of the end point. |
Distance(mi) | The length of the road extent affected by the accident. |
Description | Shows natural language description of the accident. |
Number | Shows the street number in address field. |
Street | Shows the street name in address field. |
Side | Shows the relative side of the street (Right/Left) in address field. |
City | Shows the city in address field. |
County | Shows the county in address field. |
State | Shows the state in address field. |
Zipcode | Shows the zipcode in address field. |
Country | Shows the country in address field. |
Timezone | Shows timezone based on the location of the accident (eastern, central, etc.). |
Airport_Code | Denotes an airport-based weather station which is the closest one to location of the accident. |
Weather_Timestamp | Shows the time-stamp of weather observation record (in local time). |
Temperature(F) | Shows the temperature (in Fahrenheit). |
Wind_Chill(F) | Shows the wind chill (in Fahrenheit). |
Humidity(%) | Shows the humidity (in percentage). |
Pressure(in) | Shows the air pressure (in inches). |
Visibility(mi) | Shows visibility (in miles). |
Wind_Direction | Shows wind direction. |
Wind_Speed(mph) | Shows wind speed (in miles per hour). |
Precipitation(in) | Shows precipitation amount in inches, if there is any. |
Weather_Condition | Shows the weather condition (rain, snow, thunderstorm, fog, etc.). |
Amenity | A Point-Of-Interest (POI) annotation which indicates presence of amenity in a nearby location. |
Bump | A POI annotation which indicates presence of speed bump or hump in a nearby location. |
Crossing | A POI annotation which indicates presence of crossing in a nearby location. |
Give_Way | A POI annotation which indicates presence of give_way sign in a nearby location. |
Junction | A POI annotation which indicates presence of junction in a nearby location. |
No_Exit | A POI annotation which indicates presence of no_exit sign in a nearby location. |
Railway | A POI annotation which indicates presence of railway in a nearby location. |
Roundabout | A POI annotation which indicates presence of roundabout in a nearby location.) |
Station | A POI annotation which indicates presence of station (bus, train, etc.) in a nearby location. |
Stop | A POI annotation which indicates presence of stop sign in a nearby location. |
Traffic_Calming | A POI annotation which indicates presence of traffic_calming means in a nearby location. |
Traffic_Signal | A POI annotation which indicates presence of traffic_signal in a nearby location. |
Turning_Loop | A POI annotation which indicates presence of turning_loop in a nearby location. |
Sunrise_Sunset | Shows the period of day (i.e. day or night) based on sunrise/sunset. |
Civil_Twilight | Shows the period of day (i.e. day or night) based on civil twilight. |
Nautical_Twilight | Shows the period of day (i.e. day or night) based on nautical twilight. |
Astronomical_Twilight | Shows the period of day (i.e. day or night) based on astronomical twilight.) |
Let us take a look at what our data looks like!
## # A tibble: 298,062 x 24
## Source Severity Start_Time End_Time Distance.mi. Description City County
## <chr> <int> <chr> <chr> <dbl> <chr> <chr> <chr>
## 1 MapQu~ 2 2016-11-3~ 2016-11~ 0.01 Accident o~ Aust~ Travis
## 2 MapQu~ 2 2016-11-3~ 2016-11~ 0 Accident o~ Aust~ Travis
## 3 MapQu~ 2 2016-11-3~ 2016-11~ 0 Accident o~ Aust~ Travis
## 4 MapQu~ 2 2016-11-3~ 2016-11~ 0.01 #2 lane bl~ Fort~ Tarra~
## 5 MapQu~ 3 2016-11-3~ 2016-11~ 0.01 #2 lane bl~ Fort~ Tarra~
## 6 MapQu~ 2 2016-11-3~ 2016-11~ 0.01 Accident o~ Dall~ Dallas
## 7 MapQu~ 3 2016-11-3~ 2016-11~ 0.01 #2 / #3 la~ Dall~ Dallas
## 8 MapQu~ 2 2016-11-3~ 2016-11~ 0 Accident o~ Dall~ Dallas
## 9 MapQu~ 2 2016-11-3~ 2016-11~ 0 Accident o~ Dall~ Dallas
## 10 MapQu~ 3 2016-11-3~ 2016-11~ 0.01 HOV lane b~ Dall~ Dallas
## # ... with 298,052 more rows, and 16 more variables: Temperature.F. <dbl>,
## # Weather_Condition <chr>, Crossing <chr>, Give_Way <chr>, Junction <chr>,
## # No_Exit <chr>, Railway <chr>, Roundabout <chr>, Station <chr>, Stop <chr>,
## # Sunrise_Sunset <chr>, Traffic_Calming <chr>, Traffic_Signal <chr>,
## # Turning_Loop <chr>, Start_Lat <dbl>, Start_Lng <dbl>
If you would like to learn more about the dataset, please visit: https://www.kaggle.com/sobhanmoosavi/us-accidents
As part of the license for utilizing this dataset, it is required that we cite 2 papers that this dataset was used for as part of research. The papers are as follows if you would like to learn more.
######### READING IN CLEANED DATA ##########
raw <- raw %>%
separate(Start_Time, c("Start_Date", "Start_Time"), sep = " ") %>%
separate(End_Time, c("End_Date", "End_Time"), sep = " ") %>%
mutate(
wday = weekdays(as.Date(Start_Date, format = "%Y-%m-%d"))
)
##### GETTING DALLAS DATA #######
#list for counties
dallas_plano_irving <- c("Collin", "Dallas", "Denton", "Ellis", "Hunt", "Kaufman", "Rockwall")
dfw <- data.frame() #create new variable
for (county in dallas_plano_irving){ #loop to grab all
dfw <- raw %>%
subset(County == county) %>%
bind_rows(dfw)
}
####### GENERAL SUMMARIES #######
#getting counts for all cities
city_count <- raw %>%
group_by(City) %>%
dplyr::summarise(
accidents = dplyr::n(),
mean_sev = mean(Severity)
) %>%
arrange(-accidents)
#getting counts for all counties
county_count <- raw %>%
group_by(County) %>%
dplyr::summarise(
accidents = dplyr::n(),
mean_sev = mean(Severity)
) %>%
arrange(-accidents)
#getting counts for all dates
date_count <- raw %>%
group_by(Start_Date) %>%
dplyr::summarise(
accidents = dplyr::n(),
mean_sev = mean(Severity)
) %>%
arrange(-accidents)
#getting counts for all severities
severity_count <- raw %>%
group_by(Severity) %>%
dplyr::summarise(
accidents = dplyr::n(),
mean_sev = mean(Severity)
) %>%
arrange(-accidents)
#getting counts for all weekdays
weekday <- raw %>%
group_by(wday) %>%
dplyr::summarise(
accidents = dplyr::n(),
mean_sev = mean(Severity)
) %>%
arrange(-accidents)
#getting missing date values
date_count <- date_count %>%
mutate(
Date = as.Date(Start_Date, format="%Y-%m-%d")
)
#first and last day of data
min_d<- min(date_count$Date)
max_d<- max(date_count$Date)
Lets start with taking a look at a general overview of the data.
## Severity Temperature.F. Distance.mi.
## Min. :1.000 Min. :-40.00 Min. : 0.0000
## 1st Qu.:2.000 1st Qu.: 60.80 1st Qu.: 0.0000
## Median :2.000 Median : 73.40 Median : 0.0000
## Mean :2.289 Mean : 71.26 Mean : 0.1152
## 3rd Qu.:3.000 3rd Qu.: 82.90 3rd Qu.: 0.0100
## Max. :4.000 Max. :161.60 Max. :130.4900
## NA's :5951
Here we can see the general statistics for the continuous data.
The average severity in Texas is 2.289
The average Temperature is 71.26 F
The average Distance is 0.1 miles
##### TREEMAP FOR SEVERITY ########
hc <- severity_count %>%
hchart(type="treemap", hcaes(x=paste("Severity:", Severity, sep=" "), value=accidents, color=accidents)) %>%
hc_colorAxis(minColor = "#F4D03F", maxColor="red") %>%
hc_add_theme(hc_theme_flat()) %>%
hc_title(text = "Total Accidents by Severity, for Texas",
align = "left") %>%
hc_subtitle(text = "color corrosponds to the number of accidents",
align = "left") %>%
hc_tooltip(formatter = JS("function(){
return (' Severity: ' + this.point.Severity + ' <br> # Accidents: ' + this.point.accidents)}"))
hc
This Treemap can tell us a few things about the Severity Category in the data, for example:
There are 4 Severity levels in the data,
98% of the recorded accidents are categorized as Severity level 2 or 3.
1.1% of the recorded accidents are categorized as level 4, the most severe accident.
While less than 0.04% of the recorded accidents are considered a Severity level of 1.
The total number of recorded accidents in Texas is 298062 accidents.
#### TREEMAP FOR COUNTY COUNT ######
hc <- county_count %>%
slice(1:8) %>%
hchart(type = "bar",
hcaes(x=County, y=accidents, color=County)) %>%
hc_add_theme(hc_theme_flat()) %>%
hc_yAxis(title = list(text = "# of Accidents")) %>%
hc_title(text = "Total Accidents by County, in Texas",
align = "left") %>%
hc_tooltip(formatter = JS("function(){
return (' County: ' + this.point.County + ' <br> # Accidents: ' + this.point.accidents + ' <br> Severity: ' + this.point.mean_sev.toFixed(2))}"))
hc
From the above, we can see the four most accident prone Counties:
The graph shows the disparity of the top three Counties and the preceeding Counties. With the third highest County, Travis County, having 41397 more accidents than the fourth highest County, Bexar County.
#### TREEMAP FOR CITY COUNTS ######
hc <- city_count %>%
slice(1:8) %>%
hchart(type = "treemap",
hcaes(x=City, value=accidents, color=mean_sev)) %>%
hc_colorAxis(minColor = "#F4D03F", maxColor="red") %>%
hc_add_theme(hc_theme_flat()) %>%
hc_title(text = "Total Accidents by City, in Texas",
align = "left") %>%
hc_subtitle(text = "color corrosponds to the mean severity level",
align = "left") %>%
hc_tooltip(formatter = JS("function(){
return (' City: ' + this.point.City + ' <br> # Accidents: ' + this.point.accidents + ' <br> Severity: ' + this.point.mean_sev.toFixed(2))}"))
hc
From the graph, The five most accident prone Cities in Texas are:
Again, we are seeing a huge disparity of the top three Cities and the preceeding Cities.
These three cities combined have 70% of the total recorded accidents. meaning the other 710 Cities make up for the remaining 30%.
#get all dates
alldates <- seq(min_d, max_d,1)
#take out dates that occur in data
alldates <- alldates %>%
subset(!(alldates %in% date_count$Date))
#create dataframe of all dates, with 0 accidents
alldates <- data.frame(Date=alldates, accidents=0, mean_sev=0)
#combining missing dates with old data
total <- alldates %>%
bind_rows(date_count) %>%
subset(select = c(Date, accidents, mean_sev)) %>%
arrange(Date) %>%
mutate(
wday = weekdays(Date) #get weekdays
)
total %>%
hchart(type="column", hcaes(x=Date, y=accidents), color="black") %>%
hc_add_theme(hc_theme_flat()) %>%
hc_title(text = "Total Accidents by Date",
align = "left") %>%
hc_yAxis(title = list(text = "# of Accidents")) %>%
hc_xAxis(title = "", type = "datetime", dateTimeLabelFormats = list(day = '%Y/%m')) %>%
hc_tooltip(formatter = JS("function(){
return (' Date: ' + this.point.Date)+ ' <br> Weekday: ' + this.point.wday + ' <br> # Accidents: ' + this.point.accidents + ' <br> Mean Severity: ' + this.point.mean_sev.toFixed(2)}"))
This graph shows exactly how much data is in this data set, as well as any patterns in the number of Accidents. For example,
The graph seems to show high variation throughout the individual weeks,
There only seems to be one major dip in the data from May 24, 2017 to June 1, 2017.
There might be a small decline in the number of accidents over time,
Lets see if Severity, Distance of Accidents, and Temperature are correlated! To do this, we need use use pearsons r and pearsons r squared.
First, lets take a look at pearsons r.
raw_cor <- raw %>%
subset(!is.na(Severity)) %>%
subset(!is.na(Distance.mi.)) %>%
subset(!is.na(Temperature.F.)) %>%
subset(select = c(Severity, Distance.mi., Temperature.F.)) %>%
as.matrix() %>%
cor()
raw_cor
## Severity Distance.mi. Temperature.F.
## Severity 1.00000000 0.14062302 -0.02974895
## Distance.mi. 0.14062302 1.00000000 -0.01100098
## Temperature.F. -0.02974895 -0.01100098 1.00000000
Pearsons r cant tell us a lot, however we can gather a few bits of information:
Lets check how well correlated these factors are, we can do this simply by just squaring the above correlation.
#lets take a look at the severity matched with the start time
raw_cor_2 <- raw_cor %>%
'^'(2) %>%
round(5) %>% #round
hchart() %>% #graph
hc_colorAxis(minColor = "#F4D03F", maxColor="red") %>%
hc_add_theme(hc_theme_flat()) %>%
hc_title(text = "Pearsons r squared matchup",
align = "left") %>%
hc_subtitle(text = "For TX accidents",
align = "left") %>%
hc_xAxis(categories = list("Severity", "Distance of Accident", "Temperature")) %>%
hc_yAxis(categories = list("Severity", "Distance of Accident", "Temperature"))%>%
hc_legend(align = "left") %>%
hc_plotOptions(
series = list(
boderWidth = 0,
dataLabels = list(enabled = TRUE)))
raw_cor_2
All of the r squared values are very low, with the highest being Severity and Distance of the Accident (0.01977). This tells us that only ~1.9% of the Severity samples can be explained by the Distance of the Accident.
As all of these r2 values are very low, this suggests that none of the factors can be explained by the other.
In this Analysis, I will dive into the differences of the recorded accidents according to the day of the week!
#### CLEARING VARIABLES ###
rm(list = ls(all.names = TRUE))
### GETTING A NEW RAW
# This is to make merging the analysis into the main file easier
raw <- read.csv("cleaned-accidents.csv")
raw <- raw %>%
separate(Start_Time, c("Start_Date", "Start_Time"), sep = " ") %>%
separate(End_Time, c("End_Date", "End_Time"), sep = " ") %>%
mutate(
wday = wday(Start_Date, label=TRUE)
)
#getting counts for all dates
date_count <- raw %>%
group_by(Start_Date) %>%
dplyr::summarise(
accidents = dplyr::n(),
mean_sev = mean(Severity)
) %>%
arrange(-accidents)
#getting missing date values
date_count <- date_count %>%
mutate(
Date = as.Date(Start_Date, format="%Y-%m-%d")
)
#first and last day of data
min_d<- min(date_count$Date)
max_d<- max(date_count$Date)
#### PLOTTING ######
hc <- date_count %>%
subset(select = c(Date, accidents, mean_sev)) %>%
mutate(
wday = wday(Date, label=TRUE) #get weekdays
) %>%
group_by(wday) %>%
dplyr::summarise(
accidents = sum(accidents),
mean_sev = mean(mean_sev)
) %>%
ungroup() %>%
hchart(type= "bar", hcaes(x=wday, y=accidents)) %>%
hc_add_theme(hc_theme_flat()) %>%
hc_title(text = "Total Accidents by Day of Week",
align = "left") %>%
hc_yAxis(title = list(text = "# of Accidents")) %>%
hc_xAxis(title = list(text = "")) %>%
hc_tooltip(formatter = JS("function(){
return (' Day of Week: ' + this.point.wday + ' <br> # Accidents: ' + this.point.accidents + ' <br> Mean Severity: ' + this.point.mean_sev.toFixed(2))}"))
#call plot
hc
As shown, there is a great desperity between the Monday - Friday, and Saturday/Sunday.
While this huge drop is apparent on the weekend, one reason why might this occur is; there are less people are working, causing less people to be on the roads.
Now lets take a look into if this has any differences throughout the years.
#### PLOTTING #######
date_count %>%
subset(select = c(Date, accidents, mean_sev)) %>%
mutate(
Year = as.numeric(format(date_count$Date, "%Y"))
) %>%
mutate(
wday = wday(Date, label=TRUE) #get weekdays
) %>%
group_by(wday,Year) %>%
dplyr::summarise(
accidents = sum(accidents),
mean_sev = mean(mean_sev)
) %>%
ungroup() %>% #plot starts below
hchart(type= "bar", hcaes(x=wday, y=accidents, group = Year)) %>%
hc_add_theme(hc_theme_flat()) %>%
hc_title(text = "Total Accidents by Day of Week",
align = "left") %>%
hc_yAxis(title = list(text = "# of Accidents")) %>%
hc_xAxis(title = list(text = "")) %>%
hc_tooltip(formatter = JS("function(){
return (' Year: ' + this.point.Year + '<br> Day of Week: ' + this.point.wday + ' <br> # Accidents: ' + this.point.accidents + ' <br> Mean Severity: ' + this.point.mean_sev.toFixed(2))}"))
You can de-select anything in the legend to get a closer look at the years.
Here we can see a significant drop in the number of accidents in 2016, this is due to not having the complete year. Remember, this dataset starts on July 14, 2016.
Despite this, we still see a massive drop in accidents occuring on a Saturday or a Sunday.
Lets dive into the text data to see if there are any indcations to why there is a massive drop on Saturday and Sunday.
####### FUNCTION TO GRAB HIGHWAYS #######
get_hwy <- function(data){
new_data <- data.frame()
#get temp year
temp <- data
#seperate text data into words, by weekday
new_data <- stack(tapply(temp$Description, temp$wday, function(x) scan(text=x, what=''))) %>%
mutate(
values = ifelse(grepl(values, pattern="\\-"), removePunctuation(values, preserve_intra_word_dashes = TRUE), NA)
) %>%
mutate( #remove blank values
values = ifelse(values == "", NA, values)
) %>%
subset(!is.na(values)) %>% #delete NA's
group_by(ind) %>%
mutate( #toupper all letters
values = toupper(values)
) %>%
dplyr::rename( #rename
"words" = values,
"wday" = ind
) %>%
bind_rows(new_data) #bind back
return(new_data)
}
#get text data
text_wday <- raw %>%
subset(select = c(wday, Description, Start_Date)) %>%
mutate(
year = as.numeric(format(as.Date(raw$Start_Date,format = "%Y-%m-%d"), "%Y"))
) %>%
subset(select = -c(Start_Date)) %>%
get_hwy()
#subset into week
wk = c("Mon", "Tue", "Wed", "Thu", "Fri")
week <- data.frame()
for(day in wk){
week <- text_wday %>%
subset(wday == day) %>%
bind_rows(week)
}
week <- week %>%
subset(select = words) %>%
dplyr::count(words) %>%
arrange(-n) %>%
mutate(
"weekday hwy" = words,
"weekday count" = n,
rank = 1:nrow(.)
) %>%
subset(select = c("weekday hwy", "weekday count", "rank"))
#subset into weekend
wkd = c("Sat", "Sun")
weekend <- data.frame()
for(day in wkd){
weekend <- text_wday %>%
subset(wday == day) %>%
bind_rows(weekend)
}
weekend <- weekend %>%
subset(select = words) %>%
dplyr::count(words) %>%
arrange(-n) %>%
mutate(
"weekend hwy" = words,
"weekend count" = n,
rank = 1:nrow(.)
) %>%
subset(select = c("weekend hwy", "weekend count", "rank"))
#combining datasets for a data table
week %>%
slice(1:80) %>%
bind_cols(weekend %>%
slice(1:80)) %>%
subset(select = -c(rank1)) %>%
datatable()
This is a lookup table, you can use this to search through the top 80 ranks!
These are the top 10 accident prone highways for the weekdays and the weekend!
We can see that I-35 is, by far, the worst highway in texas. Besides I-35, the only highway that stays on the same rank is I-45.
Every other highway tends to change ranks, which could suggest that certain highways get proportionally less traffic on the weekdays, as opposed to the weekends, or vice versa!
Some transformation is required to get dates and times in a more managable state as well as adding in a duration column. The years are filtered on just 2019 to get an overview of accidents for the year and to prevent crashes to to volume, coordinate columns are renamed and specific columns are selected to display later in a label on the map.
tx_accidents_2019 <- raw %>%
mutate(start = Start_Date,
end = End_Date,
Start_Time = hms(Start_Time),
End_Time = hms(End_Time),
Duration.minutes = End_Time - Start_Time)%>%
filter(start >= "2019-01-01") %>%
rename(Longitude = Start_Lng, Latitude = Start_Lat) %>%
select(start, Severity, Duration.minutes, Sunrise_Sunset, Weather_Condition,
Temperature.F., Description, Longitude, Latitude, end)
Here is an interactive map of Accidents in 2019 that occur within Texas. Data points are clustered together based off the level of Severity.
Feel free to play around with some of the controls.
You can do the following:
# Set color palette
pal <- colorBin("YlOrRd", domain = tx_accidents_2019$Severity, 1:4)
# Creating groups of individual severity levels for filtering
groups <- as.character(sort(unique(tx_accidents_2019$Severity), decreasing = TRUE))
# Creating html labels for map
labs <- lapply(seq(nrow(tx_accidents_2019)), function(i) {
paste0(
'Date: ', tx_accidents_2019[i, "start"], '<br/>',
'Severity: ', tx_accidents_2019[i, "Severity"], '<br/>',
'Duration: ', tx_accidents_2019[i, "Duration.minutes"], ' minutes.<br/>',
'Time of Day: ', tx_accidents_2019[i, "Sunrise_Sunset"], '<br/>',
'Weather Condition: ', tx_accidents_2019[i, "Weather_Condition"], '<br/>',
'Tempurature: ', tx_accidents_2019[i, "Temperature.F."], ' F<br/>',
tx_accidents_2019[i, "Description"])
})
# Create a df for the labels
labeldf <- data.frame(Severity = tx_accidents_2019$Severity, Labels = unlist(labs), stringsAsFactors = FALSE)
# Create a map from leaflet to build from
map <- leaflet(tx_accidents_2019, options = leafletOptions(minZoom = 5, maxZoom = 18)) %>%
setView(lng = -99.9018, lat = 31.9686, zoom = 5) %>%
setMaxBounds(-109.9018, 37.9686, -89.9018, 24.9686) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addLayersControl(overlayGroups = groups,
options = layersControlOptions(collapsed = TRUE))
# Create layers for each severity level to be filtered
for (x in groups) {
indiGroup = tx_accidents_2019[tx_accidents_2019$Severity == x,]
map <- map %>%
addCircleMarkers(data = indiGroup,
clusterOptions = markerClusterOptions(),
~Longitude, ~Latitude,
group = x,
radius = 5,
color = ~pal(Severity),
stroke = FALSE,
fillOpacity = 0.5,
label = lapply(labeldf[labeldf$Severity == x,]$Labels, htmltools::HTML)
)
}
# Display graph
map