DSSG Datathon 2021: Organizing an online Datathon

Vadim Voskresenskii
DSSG Berlin Blog
Published in
5 min readMar 17, 2022

--

What is a datathon?

Datathons are crucial activities that DSSG Berlin organizes almost every year. A datathon of DSSG is a short (usually not more than one weekend) and very intensive event at which volunteering data analysts and data scientists work in small groups on data-related challenges coming from German non-profit organizations. The aim is not to prepare the final solution for a problem but rather to outline the main directions in which the project can be developed and to investigate the issue from different perspectives. Therefore, it is important to have participants who possess various backgrounds and skill sets. Another essential part of datathons are our data ambassadors. These are data analysts who prepare, clean, and structure the relevant data from non-profit organizations beforehand to make potential analyses on the day of the datathon feasible.

Our remote datathon 2021

Before the pandemic in 2020, we organized datathons in real-life format. However, new restrictions related to Covid-19 and the overall desire of people to avoid big gatherings posed a new challenge to us. How to organize a datathon in a Covid era? We decided to create an online event. Zoom was used for project and final presentations. Moreover, participants used breakout rooms in Zoom to discuss and organize the process of the data analysis. In Slack channels, volunteers discussed problems and shared initial results and drafts of presentations.

The datathon took place on the weekend 19–21 November 2021. On Friday, the participants were introduced to the challenges and had a chance to socialize and think of potential solutions. On Saturday, volunteers worked on the projects conducting various analyses and on Sunday, the final results were presented to the non-profit organizations. A lively discussion about the results and potential implications showed the value and impact of our volunteers’ work. In addition to the main program, we had presentations from our sponsors, ResearchGate and Kineo.ai, and a talk about participative democracy and AI from Luke Jordan (MIT). To make this event more similar to real/offline events with regular coffee breaks and socialization, we ordered food for all our participants to jointly have a remote dinner in a customizable calling space Gather. Furthermore, to ensure that our volunteers stay motivated and keep their energy levels up, we provided goodie bags for everyone that included some sweets and treats, such as energy bars, chocolate, and tea.

Left: Contains of a goodie bag; Right: Remote dinner in Gather

Finally, with more than 30 active participants and due to the digital format, we were able to create an international event with people from all over Europe. Our participants came from industry and academia and had various experiences with data analytics, thus allowing a very diverse project team formation. They worked on challenges offered by three non-profit organizations: ADFC (the Allgemeiner Deutscher Fahrrad Club), DRK (the German Red Cross), and Bezirksamt Friedrichshain-Kreuzberg.

Challenges and results

The ADFC acts as Germany’s cycling lobby and supports more than 200,000 members with a diverse range of services and assistance, such as roadside assistance or liability and legal costs insurance. The ADFC promotes the development of a better cycling infrastructure in German cities. To support their demands and to convince local governments and politicians, ADFC requires evidence that people actually switch from motorized vehicles to bicycles and that cycle lanes are a beneficial investment to address our climate crisis. Volunteers who signed up to help the ADFC made use of data from bike traffic counting stations and combined it with weather and Covid-19 data to analyze and substantiate ADFC’s arguments for further expansion and improvement of the cycling infrastructure.

Multiple visualizations for various German regions were created and handed over together with conclusions and recommendations drawn from the data analysis.

One of the vizualizations made by participants working on the ADFC project

Together with more than three million members, the DRK is one of the biggest welfare organizations in Germany and runs multiple hospitals and nursing homes. These are also affected by the shortage of care work specialists that was already alarming and now becomes more and more evident during the Covid-19 crisis. To support the DRK recruiting, volunteers used job portal data to examine job postings regarding their effectiveness. To be precise, volunteers analyzed different aspects of job descriptions and how they influence the number of applications. Therefore, website traffic data was combined with the job description, and different NLP (natural language processing) techniques were used to investigate the impact of different aspects. One finding, for example,was that the proportion of applications to apprenticeship job postings is 4 to 1. The volunteers thus concluded that there is an interest of young people to work in the social sector but the number of apprenticeships might not properly accommodate this interest. These very interesting results were also presented at the Future of Care Event after our Datathon. This was an internal meeting at the DRK focusing on issues related to care work. Thereby, the volunteers were able to support and advise the DRK recruiting on how job postings should be improved.

Bezirksamt Friedrichshain-Kreuzberg is the administrative authority of Berlin Friedrichshain-Kreuzberg and is responsible for, among other things, the parking-space management in this area. OpenStreetMap data was already used to support this management and volunteers helped the Bezirksamt by further investigating the data to visualize potentials and specific needs of the district.

Volunteers enriched the provided datasets by including demographic information about the people living in Friedrichshain-Kreuzberg. Furthermore, they investigated different questions related to the availability of parking spots in different areas. One example was to look at how many parking spots are provided for disabled people. Results indicated that there are appropriate spots but they are rather unevenly distributed within the Bezirk. Another example was to investigate the ratio of cars registered to parking spots available. As you can see in the following graphic, this proportion varies highly among different areas in Friedrichshain-Kreuzberg.

The analysis of the parking situation in Friedrichshain-Kreuzberg

We are happy that our first fully virtual datathon was such a success and that so many talented people decided to spend their weekend working on important social issues. We are also very grateful for the non-profit organizations, our sponsors, and data ambassadors who invested time and energy to make this event possible.

Authors: Lisa Zäuner, Vadim Voskresenskii

--

--