Many Trees Grow in Brooklyn...

Using Spatial Statistics to Visually Explore New York City's Street Tree Census
Python | QGIS | Illustrator | GitHub
CURIOSITY How does New York City's trees per capita vary by census tract?

DATA The New York City Street Tree Census is organized by the NYC Parks and Recreation department. Collected data includes tree species, diameter, perception of health, and, most importantly, for my purposes, tree location coordinates. A street tree is defined as any tree growing within the public right-of-way. Trees within park boundaries are not included in tree counts, however trees lining park parameters are. The Street Tree Census, representing all five boroughs, documented 683,788 trees in 2015 (15% more than the 2005 census). To calculate trees per capita, I opted to use 2020 U.S. Census population data.

EXPLORATION Data was initially joined, analyzed, calculated, and visualized with Python. I explored spatial autocorrelation using the Local Moran's I statistic. The Local Moran maps were generated with Python. Heat maps and tree count maps were created in QGIS. All maps and visualizations were further refined in Adobe Illustrator.


Spatial Autocorrelation: Local Moran's I

Local spatial autocorrelation, referring to the relationship between a value to its neighboring values in space, allows us to identify clusters and outliers. I explored my data using the Local Moran's I statistic. Local Moran's I identifies four types of geographic areas:

• Hot Spots (clusters)
Areas with high values being near other areas with high values
Census tracts with high trees per capita near other tracts with high trees per capita

• Cold Spots (clusters)
Areas with low values being near other areas with low values
Census tracts with low trees per capita near other tracts with low trees per capita

• Diamonds (outliers)
think of a diamond in the rough
An area with a high value being near areas with low values
A census tract with high trees per capita near tracts with low trees per capita

• Doughnuts (outliers)
think of the empty center of a delicious doughnut
An area with a low value being near areas with high values
A census tract with low trees per capita near tracts with high trees per capita

When working with Moran's I, every value has an impact on the results. Extreme high and low values can skew the overall dataset, so where you opt to draw the line for outliers can impact the final output and generate vastly different maps. In my data exploration, I attempted a variety of slightly varied outlier trims before landing on maps that I felt best represented the data. I first visualized spatial autocorrelation for all of New York City, but since the five boroughs vary greatly (it is hard to find many similarities between Manhattan and Staten Island, for example), I drilled down to look at each borough independently. In addition to the maps of clusters and outliers, I included three heat maps displaying population per square mile, tree count per square mile, and median household income. Although income was not included in my analysis, I found it informative to consider a snapshot of its distribution.
map of New York City, using Local Moran's I statistic to display hotspots and outliers of Trees per Capita, per 2020 census tract
New York City, heat map, census tract, population per square mile
New York City, heat map, census tract, trees per square mile
New York City, heat map, census tract, median household income

map of Manhattan, New York City, using Local Moran's I statistic to display hotspots and outliers of Trees per Capita, per 2020 census tract
Manhattan, New York City, heat map, census tract, population per square mile
Manhattan, New York City, heat map, census tract, trees per square mile
Manhattan, New York City, heat map, census tract, median household income

map of Brooklyn, New York City, using Local Moran's I statistic to display hotspots and outliers of Trees per Capita, per 2020 census tract
Brooklyn, New York City, heat map, census tract, population per square mile
Brooklyn, New York City, heat map, census tract, trees per square mile
Brooklyn, New York City, heat map, census tract, median household income

map of Queens, New York City, using Local Moran's I statistic to display hotspots and outliers of Trees per Capita, per 2020 census tract
Queens, New York City, heat map, census tract, population per square mile
Queens, New York City, heat map, census tract, trees per square mile
Queens, New York City, heat map, census tract, median household income

map of Bronx, New York City, using Local Moran's I statistic to display hotspots and outliers of Trees per Capita, per 2020 census tract
Bronx, New York City, heat map, census tract, population per square mile
Bronx, New York City, heat map, census tract, trees per square mile
Bronx, New York City, heat map, census tract, median household income

map of Staten Island, New York City, using Local Moran's I statistic to display hotspots and outliers of Trees per Capita, per 2020 census tract
Staten Island, New York City, heat map, census tract, population per square mile
Staten Island, New York City, heat map, census tract, trees per square mile
Staten Island, New York City, heat map, census tract, median household income

Tree Counts

Spatial autocorrelation makes for interesting maps, but ultimately serves to raise more questions than answers. My analysis stopped at the visualizations, and did not further explore what variables might explain the presence of hot spots or outliers. Additionally, clean maps with no geographic labels are visually appealing, but make it difficult to orientate oneself and pinpoint specific areas. It was with these limitations in mind that I opted to explore a more straightforward metric - trees per square mile by neighborhood.

To be consistent with the tree census geography, I worked with the 2010 neighborhood tabulation areas (NTAs). I created heat maps to visualize trees per square mile in New York City and each borough individually. The borough maps are directly labeled with neighborhood names and actual tree counts. Finally, for each geographic breakdown, I cited neighborhoods with the lowest and highest tree counts per square mile.
New York City, NTAs, neighborhood, heat map, trees per square mile
bar chart, New York City neighborhoods, NTAs, lowest tree counts per square mile
bar chart, New York City neighborhoods, NTAs, highest tree counts per square mile

Manhattan, New York City, NTAs, neighborhood, heat map, trees per square mile, tree count
Manhattan, bar chart, New York City neighborhoods, NTAs, lowest tree counts per square mile
Manhattan, bar chart, New York City neighborhoods, NTAs, highest tree counts per square mile

Brooklyn, New York City, NTAs, neighborhood, heat map, trees per square mile, tree count
Brooklyn, bar chart, New York City neighborhoods, NTAs, lowest tree counts per square mile
Brooklyn, bar chart, New York City neighborhoods, NTAs, highest tree counts per square mile

Queens, New York City, NTAs, neighborhood, heat map, trees per square mile, tree count
Queens, bar chart, New York City neighborhoods, NTAs, lowest tree counts per square mile
Queens, bar chart, New York City neighborhoods, NTAs, highest tree counts per square mile

Bronx, New York City, NTAs, neighborhood, heat map, trees per square mile, tree count
Bronx, bar chart, New York City neighborhoods, NTAs, lowest tree counts per square mile
Bronx, bar chart, New York City neighborhoods, NTAs, highest tree counts per square mile

Staten Island, New York City, NTAs, neighborhood, heat map, trees per square mile, tree count
Staten Island, bar chart, New York City neighborhoods, NTAs, lowest tree counts per square mile
Staten Island, bar chart, New York City neighborhoods, NTAs, highest tree counts per square mile

Concluding Thoughts

My data exploration only considered one spatial statistics question: "Where?" Where are the clusters and outliers? I have yet to ask why the clusters and outliers are occurring where they do. How they are influencing one another? What other factors are contributing to these outcomes? I also have not yet considered how this exploration could be put to practical use - for example, could these answers help to determine where the city should plant future trees?

My aim is that by the time the next NYC Street Tree Census is released (2026), I will have expanded my spatial statistics skill-set to enable a more robust and informative exploration of the data.
BACK NEXT