I was playing around with Mode's query editor, when lo and behold, I found a tutorial data set I was interested in exploring further: a recording of every reported criminal incident in San Francisco from the week of October, 28th, 2013 all the way to the week of January, 27th, 2014.
It wasn't the biggest data set, but I had always been curious about looking through crime data in SF ever since I started living here. I'd gotten word that certain neighborhoods were to be avoided, and living right in SOMA, I was right in the middle of a block where to the right, Union Square and the Intercontinental Hotel beckoned, while to the left, there were fistfights and public masturbation.
I wanted to look to the data instead of stereotype. Here's what I found.
There were 30,400 recorded incidents during that time.
Here is a map of every San Francisco police district.
You can note that Bayview roughly corresponds to Hunter's Point, Portero Hill, and Dogpatch, Mission to the Mission, Tenderloin to the Tenderloin, Southern to SOMA, Richmond to Outer and Inner Richmond, Northern to Nob Hill, Pac Heights, and the Marina Central to the Financial District and Fisherman's Wharf, Park to Castro and Noe Valley, and Taravel to Sunset and finally Ingleside to Bernal Heights.
Here they are ranked from the most cases resolved per district. The first column is the number of total incidents. The second is the number of unsolved incidents out of that total. The third is the percentage of resolved cases.
TENDERLOIN 2560 971 62%
MISSION 4098 2275 44%
BAYVIEW 2839 1650 41%
SOUTHERN 5890 3587 39%
PARK 1690 1038 38%
INGLESIDE 2559 1626 36%
TARAVAL 2019 1408 30%
NORTHERN 3808 2775 27%
CENTRAL 3219 2322 27%
RICHMOND 1718 1275 25%
In order to illustrate that more starkly, here is a bar graph of # of incidents per Police District.
And here is a bar graph of where incidents get resolved the most compared to the total incidents there were in police districts. Resolution in this case is defined by the police as either arresting a suspect, locating a stolen object, or declaring a case invalidated on mental health grounds.
A couple of interesting things come to mind here.
1) Despite its reputation--the Tenderloin was supposedly named because police who worked there could afford better cuts of steak due to extra hazard pay--, the police district at this particular district actually saw less crime incidents reported than more tourist-y areas such as SOMA, and the Mission.
2) If you're going to have something happen to you, the police in the Tenderloin seem especially sharp at resolving cases.
3) Richmond and the Castro seem to deserve their safe reputations.
4) SOMA at the time, and perhaps even now, isn't exactly the safest place despite housing tons of startup migrants.
Some caveats with this data:
1) It covers a short time span a few years in the past. We're talking a few months, a few years ago.
2) Reported crime incidents DO NOT EQUAL actual crime rates. The Tenderloin, for example, might have lower reported crime because people are less willing to report crime in the area.
3) These statistics do not adjust for population per police district. It is possible areas that have more incidents are simply more populous, and that if you did the adjustment you might find districts coming out better or worse adjusted on a per capita basis.
This was a fun little project that I started based on a question that had been lurking in my mind. It's by no means polished--and I don't think it says much. Feel free to contribute, add, extend on this data: you'll see everything I did in Mode.
More than anything else, I wanted to start a discussion rather than define it -- please share and let me know feedback if you have any :)