Last night I went to my first ever ‘Pint of Science’ Festival talk, at Island Bar. I think it was the first time the festival has run in Birmingham and all but the one about robots seem to have sold out. I blame Dr. Who. Hopefully, “they’ll be back” but try not to think of ‘Terminator’. The Twitter tag is #pint15.
See what you missed, here: http://pintofscience.co.uk/events/birmingham/ and you can sign up for the mailing list.
The session I chose was ‘Big Data & Cities’ http://pintofscience.co.uk/event/big-data-cities/
because I’ve been thinking about how to put data on maps recently.
Both talks were by lecturers in Human Geography from University of Birmingham.
Dr. Emmanouil Tranos talked about using mobile phone data to detect motorway traffic.
He explained that traffic behaviour can be modelled by flow (cars/hour), (traffic) density (cars/Km) and speed (Km/hour.) Each one can be calculated from the other two, Speed = Flow / Density.
The data was taken from records of 800 x GSM mobile phones in Amsterdam. Only meta-data was used, so handsets could not have been identified. However, the same data would have been far harder to access in the UK because of greater concern about data privacy. The data was of interest because of the high cost of putting physical data-clocking equipment on the surface of roads or in aggregating information from individual GPS units. The phone companies already collect the data for billing purposes.
The unit of mobile phone usage is the Erlang. One phone used for 10 minutes is billed for the same number of Erlangs as 10 phones used for 1 minute. Data is available for every cell tower and for every handover of a phone as it is moves from one tower to another. We saw graphs demonstrating that even with this fairly sparse amount of data, the model was close to data from other collection methods for flow and density, though there were problems accurately predicting speed in rush hour, as every commuter knows.
Dani Arribas-Bel http://darribas.org/ talked about using Twitter data to map the ethnic communities of an international city by the language of tweets. Conventional census data uses postcodes which do not necessarily correspond to ‘real’ neighbourhoods. Such neighbourhoods are better as units for research and for political investment. There could have been a very long philosophical discussion on the meaning of neighbourhoods so we stuck with one from the Oxford English Dictionary. This new Twitter data was described as being “a microscope for Social Science” by Steve Lohr in 2012.
We were shown colourful maps of Amsterdam’s post-code areas then later of the different areas highlighted by language differences. The data analysis used spatial machine learning techniques to find spacial structure automatically.
Reference:
Re-engineering 1991 census geography: serial and parallel algorithms for unconstrained zone design
http://www.geog.leeds.ac.uk/papers/95-3/
Tech: [ from the talk, and chats afterwards IRL and Twitter @darribas]
‘R ‘ Language & Python were used. The results were displayed as OpenStreetmap tiles, overlayed with coloured polygons. The software is available on GitHub at https://github.com/darribas
pysal is Python Spatial Analysis Library
Birmingham has an R user meeting on Meetup, called BRUM.
For lovers of co-incidence:
Yesterday, I had a message from my daughter, on a tour of Europe that she was on her way from San Sebastian to Zaragoza. I had to check where it was on a map.
I saw from his Twitter profile, after the talk that @darribas comes from Zaragoza.
I sent a reply and told my daughter of the coincidence. She told me she’d seen a poster for Pint of Science but didn’t think she’d understand it in Spanish. “Small world, Big Data”, I quipped.
I told @darribas, in case he wanted to do his talk at home next year.
He knew because his friend is one of the organisers. “Small World”, he said.
What are the chances*? (See, ‘R Language’.)
* – Statistically, the chance that SOMETHING would happen that I could pattern match was very high. The chance of it being THIS was very low. Those two things are easy to confuse, apparently.