In this blog series, we're proud to shine a light on one of the top Capstone projects from the graduating class of Data Science for All / Women. Capstone projects are a critical component of the DS4A / Women curriculum in which teams get together to work on projects that solve real-world data challenges faced by today’s leading companies and public sector organizations.
“I am a Marketing Analytics Lead at Nuveen where I work with a variety of teams to deliver key findings and inform next best actions. Before Nuveen, I earned an M.S. in Data Science & Analytics and a B.S. in Statistics at the University of Missouri. I joined DS4A to build upon my coding skills, identify additional analytics techniques to use in my current role at Nuveen, and build my network. "
“Hi I'm Lakshmi, I am a full stack developer at TIAA/MyVest where I am currently working as technical lead in the Wealth management space. I pursued my Masters in computer science after which I entered the finance world. Previously, I completed my bachelors in Aircraft engineering specializing in Avionics systems and worked a few years in the aircraft maintenance industry. I aspires to move into the field of data analytics and DS4A gave me the opportunity to learn the analytics techniques, focusing on how each one is implemented in real scenarios and also interact with all fellow participants/mentors and TAs with inspiring backgrounds who guided me in this journey."
“I'm currently a PhD candidate at Princeton University in the department of molecular biology. My research focuses on cancer and immuno-metabolism, and developing novel technologies to study small molecules with spatial resolution in vivo. Prior to Princeton, I graduated with a B.S. with Honors from Brown University in Applied Math-Biology, and also hold an M.S. in Biological Sciences from Auburn University. I joined DS4A to strengthen my computational analysis skills and to gain insight into various career trajectories beyond academia. Upon graduation in 2023, I hopes to pursue either management consulting or data science as a career, armed with the skillsets learned from DS4A."
“I am a Strategy & Analytics Consultant at Deloitte Consulting, where I specialize in delivering data-driven insights through dashboards, analytical reports, and enterprise data management capabilities. My work spanned topics such as COVID-19 response, health equity, manufacturing intelligence and foreign relations. Before Deloitte, I completed dual B.S. degrees in Economics and Mathematics/Statistics at the University of Maryland. I joined the DS4A / Women 2021 Fellowship to improve my data science skills as well as expand my network of other women in technology."
"I recently started a new position as the Senior Data Analyst on the ScienceTeam at MeQuilibrium. Prior to accepting my new position, I completed a postdoctoral research fellowship in digital health at Darmouth Geisel School of Medicine and a PhD in Behavioral Psychology with an emphasis in Behavioral Economics at the University of Kansas. I joined DS4A / Women 2021 for the opportunity to enhance and apply my coding and analytical skills to real-world problems and to learn about applications of data science to business concerns."
Citibike is the largest bike share program in the United States with over 25,000 bikes and 1,500 stations serving the greater New York City area. Transportation by bike has increased in popularity since the onset of the COVID-19 pandemic. Demand is high, but availability at popular docking stations is low.
Given the high demand and low availability, we sought to answer if there is a way to accurately predict if a Citibike station is undersupplied at any given time on any given day.
Click to read the datafolio
One of the best parts of our project was the opportunity to actually implement the full data science project cycle, from data acquisition and exploratory data analysis to building a model and producing a final, usable product with our predictions. Although we have room to improve on our final model, seeing that its prediction of future peak demand times roughly corresponded to the peak times we observed during EDA was particularly exciting.
Our biggest challenge was our 50+ million rows of data. This caused long run times for our models and required consideration and identification of many trends during our exploratory data analysis. We also encountered challenges with data availability. For example, we could not find historical data on the actual number of bikes at a station at a given time. To address this issue we developed a measure of undersupply for each station using Citibike trip history data aggregated into change across time blocks.
Who is your team's mentor and how did they help?
Our mentor Yibei McDermott, as well as TAs Paulene Barnes and Savannah Thais supported the development of the capstone project through their thoughtful feedback and insights on potential pitfalls to avoid across the many stages of the project.
This project both provided insight into Citibike station capacity and produced a fully functional trip planning tool for Citibike customers. If we had additional time and resources, we would focus on improving our model. Specifically, we would consider identifying additional relevant features, improving our measurement of some existing features, and attempting other modeling techniques.