For the next few weeks, I will be analyzing crime in Oakland using data from CrimeWatch. Please see my previous post on how I gathered and cleaned the data. My only rule is that I want everything I investigate to be actionable.
Also, a number of people asked for a copy of the cleaned data. Here it is. Go nuts. I asked folks what I should look for. Some requests were obvious, like the overall trend of crime. Others were much more specific, like robberies around BART stations. I hope to look at all of that soon. But the first thing that I needed to understand was the system a little more.
An Update from Oakland Police
An update from last week: I showed that CrimeMapping under-reports recent crime by an average of 50% ("recent" meaning the previous week). I messaged @Crime_Mapping several times with no response. I also sent CrimeWatch an email, but that bounced back. Someone suggested I contact OPD's community relations officers. I had some luck there:
@WelcomeClass meeting arranged to discuss types & definitions of data you discussed. Thanks for your work in research & raising questions.— Lt. Chris Bolton (@OPDChris) May 2, 2014
I'll post further updates here. The original article follows:
How Crime is Mapped
CrimeMapping is an online tool that you can use to visualize crime in your neighborhood. The data for CrimeMapping is from an initiative called CrimeWatch that the city of Oakland undertakes. You can go onto CrimeMapping, look up an address, and see recent crimes committed nearby.
CrimeWatch sends CrimeMapping this data every morning along with an automatically generated tabular file on their own website in the morning available here in a file called 'crimePublicData.csv'.
If you open that file, you can see that it has a rolling four month window for crime reports. For example, if you look at it today (April 26, 2014), the earliest date in crimePublicData.csv is December 26, 2013. If you open the same file tomorrow, the earliest date will be December 27, 2013.
This is how CrimeMapping stays up to date. And when you go to their site, by default you see the last week in whatever area you're interested in.
The limits of CrimeMapping
Something interesting from CrimeMapping's FAQ:
How can I get historical or additional information for a crime?
CrimeMapping.com includes a rolling one hundred and eighty (180) days’ worth of crime data. If you require more historical data or more detailed information regarding a specific incident you should contact the public information officer at your local law enforcement agency.
Why am I not seeing a crime that I know occurred? Each incident has to be confirmed and entered as a report by the law enforcement agency before it can be uploaded to our website. For some incidents it may take some time for this process to be completed so it would not immediately appear on the website. If the case is still an open investigation, it will not appear until the case is closed.
In other words, CrimeMapping cannot show cases that are still open, or cases that close after 120 days. This suggests a built-in bias against more recent crimes.
What you can't see
In addition to the daily report, CrimeWatch also published their full set of 2012 and 2013 case reports. This is a year-end roundup, and assuming crime does not change much annually, we may be able to see how much crime is being under reported:
The y-axis is the number of crime reports. The x-axis is the number of days since January 1st. Each day represents the average of the last week. I added a smoother curve with confidence intervals for clarity. The data was pulled on 04/20/2014 (109 days after the new year).
Clearly, what stands out is the steep drop in 2014.
Let's zoom in.
The data was pulled 109 days after the new year. What you see in the figure are two weeks of case reports CrimeWatch uploaded to CrimeMapping prior to that day (in green). In red and blue, you see 2012 and 2013, respectively.
At day 94, there are about 60 cases on 2014 compared to about 70 for the prior years. By day 102, it's about 50 cases compared to 70. By day 108, it's around 23 to 73. The closer we get to present day, the further we get from where crime should be.
Here's the kicker: By default, CrimeMapping displays the last 7 days of data. If you take the average difference for each day in the past week against the historical, you see that, by default, CrimeMapping can only show about 50% of the crime that is actually reported.
To be clear, CrimeMapping does warn its viewers that only closed cases will be shown. But this warning is hidden in the FAQ. And nowhere on the site can a person understand how much crime is being under reported. Their backend may be set up to update crime as it gets closed, but we already saw that this doesn't happen until about two weeks after a certain date.
This is important, because CrimeMapping is the City of Oakland's tool to help its residents understand crime in their community. People use this data to help choose neighborhoods to live in, routes to jog, places to visit... The historical data is there. It is not expensive to make things more transparent.
- CrimeMapping should make it explicitly clear to viewers that only closed case reports are displayed,
- To help the viewer understand the magnitude of the under-reporting, CrimeMapping should use historical data to show how many crimes were reported a year ago for that date range
- Presumably, more serious crimes remain open longer, CrimeMapping should work with agencies to indicate what crimes are open without jeopardizing the investigations