Back in 2017, the Tubbs Fire in Santa Rosa had not only destroyed my family’s home, but also completely destroyed our neighborhood. With losing 95% of our belongings including irreplaceable photographs and mementos, it was a truly devastating event that traumatized me and my family.
For this report, I am exploring a dataset on the 1.88 million wildfires that occurred in the US from 1992-2015.This dataset can be found in Kaggle: https://www.kaggle.com/rtatman/188-million-us-wildfires. I will briefly look at all the data then dive into large fires only. This means fires in class size F (1000-5000)acres burned and size G with is wildfires burning over 5000+ acres. Some of the largest fires in CA for example burned hundreds of thousands of acres. I want to focus on California and Florida as this is where I live now and where the data would be most relevant to me.
Before: Our home before the Tubbs Fire.
After: Our home a few days later after the Tubbs Fire.
After two years of hardship and dealing with the situation, we were fianlly able to begin building the house in 2019.
To begin the process, load in the necessary libraries needed for SQLite, ggplot, dplyr, map data, and leaflet.
Connect to the SQLite database and load in the data. Then, disconnect the connection to the data to free up computer resources such as memory or CPU.
# use SQLite to load in database
conn <- dbConnect(SQLite(), 'C:/Users/blong/Desktop/Python/188-million-us-wildfires/FPA_FOD_20170508.sqlite')
# pull the fires table into RAM
fires <- tbl(conn, "Fires") %>% collect()
# disconnect from database to free up resources like memory
dbDisconnect(conn)
By using glimpse, it provides an overall idea of how the data is presented. For instance, this data contains 39 columns and 1.8 million rows. Several useful columns includes, but not limited to, FIRE_YEAR, or FIRE_SIZE.
## Observations: 1,880,465
## Variables: 39
## $ OBJECTID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1...
## $ FOD_ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1...
## $ FPA_ID <chr> "FS-1418826", "FS-1418827", "FS-1418835"...
## $ SOURCE_SYSTEM_TYPE <chr> "FED", "FED", "FED", "FED", "FED", "FED"...
## $ SOURCE_SYSTEM <chr> "FS-FIRESTAT", "FS-FIRESTAT", "FS-FIREST...
## $ NWCG_REPORTING_AGENCY <chr> "FS", "FS", "FS", "FS", "FS", "FS", "FS"...
## $ NWCG_REPORTING_UNIT_ID <chr> "USCAPNF", "USCAENF", "USCAENF", "USCAEN...
## $ NWCG_REPORTING_UNIT_NAME <chr> "Plumas National Forest", "Eldorado Nati...
## $ SOURCE_REPORTING_UNIT <chr> "0511", "0503", "0503", "0503", "0503", ...
## $ SOURCE_REPORTING_UNIT_NAME <chr> "Plumas National Forest", "Eldorado Nati...
## $ LOCAL_FIRE_REPORT_ID <chr> "1", "13", "27", "43", "44", "54", "58",...
## $ LOCAL_INCIDENT_ID <chr> "PNF-47", "13", "021", "6", "7", "8", "9...
## $ FIRE_CODE <chr> "BJ8K", "AAC0", "A32W", NA, NA, NA, NA, ...
## $ FIRE_NAME <chr> "FOUNTAIN", "PIGEON", "SLACK", "DEER", "...
## $ ICS_209_INCIDENT_NUMBER <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ ICS_209_NAME <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ MTBS_ID <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ MTBS_FIRE_NAME <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ COMPLEX_NAME <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ FIRE_YEAR <int> 2005, 2004, 2004, 2004, 2004, 2004, 2004...
## $ DISCOVERY_DATE <dbl> 2453404, 2453138, 2453157, 2453185, 2453...
## $ DISCOVERY_DOY <int> 33, 133, 152, 180, 180, 182, 183, 67, 74...
## $ DISCOVERY_TIME <chr> "1300", "0845", "1921", "1600", "1600", ...
## $ STAT_CAUSE_CODE <dbl> 9, 1, 5, 1, 1, 1, 1, 5, 5, 1, 1, 1, 9, 4...
## $ STAT_CAUSE_DESCR <chr> "Miscellaneous", "Lightning", "Debris Bu...
## $ CONT_DATE <dbl> 2453404, 2453138, 2453157, 2453190, 2453...
## $ CONT_DOY <int> 33, 133, 152, 185, 185, 183, 184, 67, 74...
## $ CONT_TIME <chr> "1730", "1530", "2024", "1400", "1200", ...
## $ FIRE_SIZE <dbl> 0.10, 0.25, 0.10, 0.10, 0.10, 0.10, 0.10...
## $ FIRE_SIZE_CLASS <chr> "A", "A", "A", "A", "A", "A", "A", "B", ...
## $ LATITUDE <dbl> 40.03694, 38.93306, 38.98417, 38.55917, ...
## $ LONGITUDE <dbl> -121.0058, -120.4044, -120.7356, -119.91...
## $ OWNER_CODE <dbl> 5, 5, 13, 5, 5, 5, 5, 13, 13, 5, 5, 5, 5...
## $ OWNER_DESCR <chr> "USFS", "USFS", "STATE OR PRIVATE", "USF...
## $ STATE <chr> "CA", "CA", "CA", "CA", "CA", "CA", "CA"...
## $ COUNTY <chr> "63", "61", "17", "3", "3", "5", "17", N...
## $ FIPS_CODE <chr> "063", "061", "017", "003", "003", "005"...
## $ FIPS_NAME <chr> "Plumas", "Placer", "El Dorado", "Alpine...
## $ Shape <blob> blob[60 B], blob[60 B], blob[60 B], blo...
With the data provided, I am using ggplot to create barplots that will initially help with getting a better understanding of the data. For this example, I’m only reveaing a small portion of the code as a sample (for an easier understanding).
Note: Feel free to message or email me if you are interested in looking at the R file with the rest of the code in detail.
fires %>%
group_by(FIRE_YEAR) %>%
summarize(n_fires = n()) %>%
ggplot(aes(x = FIRE_YEAR, y = n_fires/1000)) +
geom_bar(stat = 'identity', fill = '#FCBE00') +
geom_smooth(method = 'lm', se = FALSE, linetype = 'dashed', size = 0.8, color = '#FF2F00') +
labs(x = 'Year', y = 'number of wildfires by thousands', title = 'United States Wildfires from 1992 to 2015')+
theme(plot.title = element_text(hjust=0.5))
This chart illustrates all the wildfires in the Unites States from 1992 to 2015. The blue dotted-line demonstrates a small uptick in the number of wildfires in the US from 1992-2015. This indicates that there is an increase in fires from 1992 to 2015. As you can see, 2006 has the most fires by approximately 56,000 more wildfires, while 1997 has the least amount.
This graph presents 13 different causes of wildfires from 1992-2015. According to the data, the four major causes of wilfires include debris burning, miscellaneous, arson, and lightening. Surprisingly, the three lowest causes are structures, fireworks, and powerlines.
For this graph, the wildfires are divided into seven class size categories. According to the data, 0.25-9.9 has the highest amount of wildfires having approximately 94,000 wildfires.
To start creating maps, load in the data for the map. I also created new variables such as region and fire duration.
states <- map_data("state")
counties <- map_data("county")
state.abb <- append(state.abb, c("DC", "PR"))
state.name <- append(state.name, c("District of Columbia", "Puerto Rico"))
# Map the state abbreviations to state names to join with the map data
fires$region <- map_chr(fires$STATE, function(x) { tolower(state.name[grep(x, state.abb)]) })
fires$BURN_TIME <- fires$CONT_DATE - fires$DISCOVERY_DATE
Below are two US Maps that illustrate the number of wildfires by each state from 1992 to 2015.
fires %>%
select(region) %>%
group_by(region) %>%
summarize(n = n()) %>%
right_join(states, by = 'region') %>%
ggplot(aes(x = long, y = lat, group = group, fill = n)) +
geom_polygon() +
geom_path(color = 'white') +
scale_fill_continuous(low = "#FCBE00", high = "#FF2F00", name = 'Number of Wildfires') +
theme_map() +
coord_map('albers', lat0=30, lat1=40) +
ggtitle("United States Wildfires from 1992 to 2015") +
theme(plot.title = element_text(hjust = 0.5))
This first maps displays the total count of wildfires. Based on the data, California, Texas, and surprisingly Georgia had the most fires.
The second map only displays large fires that are over 1000 acres or more. According to the map, it displays that large fires occur more across the West of the US Map. The states with the largest amount of fires includes California, Texas, and now Idaho. The gray states (Vermont, Massachusetts, New Hampshire, and Rhode Island) indicate a lack of large wildfires because these states are small in size and are generally colder climates.
Next, there will be a focus shift on size class F and G (1000 acres or more) wildfires in California and Florida. The reason I chose California and Florida because they are both places that I’ve lived in, and am currently living in.
#create a data frame of california and florida with stat map data package and counties
ca_df <- states %>%
filter(region == "california")
ca_county <- counties %>%
filter(region == "california")
fl_df <- states %>%
filter(region == "florida" )
fl_county <- counties %>%
filter(region == "florida" )
#Create dataframes of just california and florida
ca_FG_fires <- fires %>%
filter(region == "california" ,( FIRE_SIZE_CLASS=='F'| FIRE_SIZE_CLASS=='G'))
fl_FG_fires <- fires %>%
filter(region == "florida" ,( FIRE_SIZE_CLASS=='F' | FIRE_SIZE_CLASS=='G'))
CAmap_year <- ggplot() +
geom_polygon(data = ca_df,aes(x = long, y = lat, group = group), colour = "black") +
geom_polygon(data = ca_county, aes(x = long, y = lat,group=group),colour='white')+
geom_point(data = ca_FG_fires, aes(x = LONGITUDE, y = LATITUDE,colour=ca_FG_fires$FIRE_YEAR))+
scale_colour_gradient(low = "yellow", high = "red", name='Year')+
labs(title='Large Wildfires by Year in California', x='Longitude', y ='Latitude')
CAmap_year + theme_light()+ theme(plot.title = element_text(hjust=0.5))
This map illustrates large wildfires in California between 1992-2015 in increments of 5 years. It indicates that the years from 1995-2000 and 2005-2010 had the most large wildfires in California. It also reveals that most of the 1992-1995 wildfires where from Southern California, and then after 2005, there was a shift of large fires to Northern California.
This graph presents the acreage burned in California. There is a clear up trend in the total amount of acreage burned each year.
This map presents 13 causes of large wildfires in California. The top three causes of large wildfires in California are lightning, miscellaneous, and equipment use. In Northern California, there is a large amount of fires caused by lightning equipment, for example, the Tubbs Fire that destroyed my home in 2017.
This map illustrates large wildfires in Florida between 1992-2015 in increments of 5 years. It indicates that the years from 1995-2005 had the most large wildfires in Florida. It also reveals that most of of the wildfires are in Southern Florida, near the Everglades.
This graph presents the acreage burned in Florida. There is a clear but small up trend in the total amount of acreage burned each year.
This map presents 9 causes of large wildfires in Florida. The top major cause of wildfires in Florida is lightning, which is atleast four times as much as any other cause. It comes to no surprise since Florida is the Thunderstorm Capital of America.
Looking at large wildfires there does seem to be an increase in the number of large wildfires across the U.S. and across California and Florida. There are likely many causes of this but it does seem climate change is a strong cause of this. As the temperatures are increasing and weather is getting more extreme it is increasing the number of wildfires in the U.S. Check out part II to see some interactive leaflet google maps.