In this blog series, we're proud to shine a light on one of the top Capstone projects from the graduating class of Data Science for All / Women. Capstone projects are a critical component of the DS4A / Women curriculum in which teams get together to work on projects that solve real-world data challenges faced by today’s leading companies and public sector organizations.
“I am a postdoctoral researcher at the Oxford Big Data Institute where I investigate early detection of dementia using digital tools. Previously, I completed a PhD in Experimental Psychology at the University of Oxford, and MA in Clinical Neuropsychology and BSc in Psychobiology at the Hebrew University of Jerusalem. I’m passionate about applying data-driven insights into behavior and cognition towards improving people’s health and wellbeing. I joined the DS4A/Women program to develop my programming and machine-learning skills, and am extremely grateful for all I have learned from the exceptional peers and instructors that make up the DS4A network. "
“Hi, I’m Victoria. Originally I’m from the Netherlands, but I came to London for my Undergraduate degree in Psychology at UCL, and then stayed for my Masters degree in Artificial Intelligence at Imperial College. In my free time I climb, play piano, and do some video editing, which came in handy for our project presentation video! My degrees taught me a lot of academic knowledge, so I decided to apply for the DS4A / Women program to gain more applied and business experience. In addition, getting a peek into the data science field through mentors and TAs, as well as the personal development lectures, all sounded very useful. My goals after the program are to get a job as a Data Scientist at a startup or big tech company."
“Hi, I'm Kristina, I have a PhD in Electrical Engineering from Imperial College London. I chose to join DS4A so that I could get some experience of SQL and Tableau and learn how to adapt the techniques I'd used in my PhD to big data
“I’m a final year PhD student, in the Silver Lab at University College London, where I work on microscopy development for the Neurosciences. I previously completed an MRes in Photonic and Electronic Systems at the University of Cambridge and an MPhys in Physics with Electronic Engineering at Heriot-Watt University, Edinburgh. I joined the DS4A/Women program to apply my programming skills in a different setting and because I was curious about a career in Data and about Data Science in general. I feel very fortunate to have received the fantastic careers mentoring offered by this program and to have had the opportunity to work on this project with such a talented team. Upon completing my PhD, I plan to transition into industry and I feel that this program has given me the skills and confidence to do so."
"I am a final year PhD student in bioinformatics, in the Oxford Protein Informatics Group at the University of Oxford. I previously studiedBiochemistry and then worked as a life sciences consultant. I chose to join DS4A because am really interested in a career in data science with the biotechnology industry. I wanted to learn from mentors already in the industry, and expand my statistical and coding knowledge beyond my academic work."
We were interested in doing a health-related project and since we are all based in the UK, we wanted to look at UK data. For these reasons, we chose to investigate the effectiveness of congestion charges on air quality and traffic flow in London. Within the UK, London suffers from the highest levels of air pollution, at levels that exceed the recommended limit set by the World Health Organisation. To tackle this problem, various schemes have been introduced in the last two decades, which charge car users to drive within certain congestion charge zones. However, we questioned whether there may be an increase in air pollution levels and traffic flow directly outside the border of the charge zone, due to car users avoiding entry into the charge zone. We wanted to investigate whether the congestion charge scheme was displacing the problem of air pollution to the borders of the charge zone, which to our knowledge had not been previously explored. To do this, we examined air pollution and traffic flow data, grouping the monitoring sites by their location either inside, outside, or near the edge, of the charge zone border. We focused on the impact of the Toxicity Charge introduced in 2017 and the Ultra-Low Emission (ULEZ) Scheme introduced in 2019. Understanding the impact of these schemes is important for the over 9 million people living in London, as well as for other cities that could benefit from introducing similar schemes.
Click to read the datafolio
Read the report of this capstone project
Victoria made the beautiful visualization map of the changes in air pollution levels across London over time. Seeing the data presented in this map allowed us to solidify our project. The project ideas were initially quite abstract, but looking at the data in this way enabled us to determine which ideas were viable and how much power our statistical tests might have.
We were surprised to see that there was no increase in traffic or Nitrogen Dioxide levels on the border of the charge zones compared to inside the border, and that there was a significant decrease both inside and outside the zone. This disproved our initial hypothesis that drivers would try to avoid the charge zone by increasingly driving along the border.
Another surprising result we found was that although Nitrogen Dioxide levels have decreased inside the charge zone, the traffic count has only decreased by 10% of this. It appears that drivers are not switching to public transport but are instead buying electric or emission-compliant vehicles. As a result it would be interesting to see what effect this has had on the price and distribution of electric vehicles, as well as on the income impact on Londoners.
We faced a number of challenges at different stages of the project, all of which were a great learning experience. In the beginning of the project, the most challenging aspect was deciding on a feasible project. We all had a lot of ideas for potential research questions but tracking down suitable datasets was difficult. To address this, we each did some initial research into one project each and presented our findings to the other group members, before voting on the project we thought best to take forward.
After deciding on the topic of our project, we came up against some difficulties regarding the data. The first of these was in determining which factors would best reveal the impact of the congestion charge schemes. The amount of air pollution data available was vast and included different types of air pollutants. We ultimately decided to focus on Nitrogen Dioxide and Particulate Matter, since the former is largely produced by car emissions and since the latter is correlated with lesser known health risks of air pollution, such as diabetes and low birth weight in infants. Another challenge we faced was in how to align the two data sets we were using (air pollution data and traffic flow data), in order to make a judgement on the impact of the congestion charge scheme. These data sets were fundamentally different, for example, not only were they issued by different governmental bodies, but they sampled data at different locations and over different time periods.
Time limitations were of course also a challenge, as we had many ideas for further extensions of the project. We would have also liked to investigate the impact of the congestion charge scheme on car sales and on lower-income families, since as part of the ULEZ scheme, older car models are charged an additional fee to enter the charge zone and since the size of the ULEZ border was dramatically increased in October 2021.
Our TA Helen Qu helped us with the planning of the project, as well as with deciding which factors to investigate. We were fortunate to have help from one of the DS4A instructors, Dr. Theresa Gerbert, in deciding on the appropriate statistical method to apply to our datasets.
Our project independently validated a recent report issued by the office of the Mayor of London, which states that the congestion charge scheme has been a success. However, our analysis highlighted how poor air quality still is, in London and throughout the rest of the UK, compared to the WHO guidelines.
With further time, we would like to compare the changes in air quality on main roads and in residential areas, as well as develop a model to predict the impact of the recent ULEZ border expansion. Additionally, we think it would be interesting to investigate what effect the congestion charge scheme has had on public transport, for example, has public transport use increased, has public transport been developed in the areas inside the charge zone, and what is the cost and convenience of this for residents.
We think that this is a very interesting and exciting topic and we had a lot of fun working together on this project. This project allowed us to put into action our data science skills and to learn about air pollution and the effect of the London congestion charges. Our most important insight is that air quality has improved with the introduction of these scheme and although there is still work to be done, the congestion charge scheme is a promising solution to the ever-growing problem of air pollution.