DSSG Datathon 2023: One fun weekend of coding and social good

Vadim Voskresenskii
DSSG Berlin Blog
Published in
5 min readJul 24, 2023

--

One of the essential activities organised by DSSG Berlin are datathons. For these events, we invite data analysts and data scientists to spend a whole weekend helping non-profit organisations with their data-related challenges. This year, we organised the datathon in an online format on the weekend of March 24–26. The online format allowed people from different parts of the world to take part in the event. Finally, 25 data enthusiasts joined us this weekend. While almost half of these volunteers live in Europe, we also had people from Asia, Africa, and North and South America. We had a balanced gender split amongst the participants and a great mix of different experience levels.

Distribution of participant by geographic regions

One of the most important lessons we learned from the previous editions of the datathon is that datasets provided by NGOs should be properly researched and preprocessed before the actual event. The format of the datathon (which is limited to one weekend) does not give volunteers enough time to create a well-developed design for analysis and to clean and preprocess the data. Hence, it was very important for us to spend enough time before the event on brainstorming how the analytics goals of NGOs can be achieved and to get acquainted with data to make the work of volunteers during the datathon more efficient and fruitful for NGOs. As in the previous datathons, we therefore invited experienced data volunteers from our network, who were eager to take the role of data ambassadors and support the participants during the event.

The first project came from the organisation WTG e.V. | Welttierschutzgesellschaft, which fights for the improvement of animals’ life conditions in developing countries. The activity of the organisation financially depends on donations from their supporters. The ojective for this NGO was to get a detailed analysis of the dynamics of donors’ behavior, to better understand which projects attain financial support, and from which parts of Germany they get the biggest contributions. For solving these challenges, the volunteers employed methods of exploratory analysis and predictive modeling. To understand the origins of donations, the participants produced detailed maps showing how contributions vary between federal states in Germany.

Map is showing how amount of donations varies by regions of Germany

The participants also found differences in financial support between genders. According to the analysis, women give higher total donation amounts overall, and they are particularly much more interested than men in supporting projects connected with climate crisis issues.

Another project was proposed by Wir für Vielfalt; the organisation focuses on facilitation of diversity in learning environments. In the framework of the datathon, they wanted to get a better understanding of how their audiences use their website and which pages (or topics) are more interesting. Based on the data provided by NGO, the team of volunteers calculated a “Bounce rate” metric, which represents a proportion of visitors leaving a website without taking any action. The volunteers also showed how the average bounce rate varies between different categories on the website. Knowing these numbers, the NGO can better understand which directions of their work are more interesting for audiences and which parts of their work need additional development. Moreover, the volunteers prepared a very detailed presentation for the NGO, showing how much time in average visitors spend on the website, how many pages they open, and what are the most interesting themes for visitors.

An equally interesting and important part of the analysis was to explore from which external webpages people come to the website of the NGO. This type of analysis is very helpful for the organisation, as it gives a nice overview of their audiences’ interests and can improve the efficiency of targeting campaigns to attain more subscribers and visitors.

The third non-profit organisation taking part in the datathon was Harambee Youth Employment Accelerator. The main goal of Harambee is to decrease the level of unemployment among youth in South Africa. Harambee works on the improvement of a matching process between young people searching for job opportunities and the job positions existing on the market. To improve this matching, Harambee asks candidates to record short messages in English to estimate their level of English proficiency. For this project, Harambee had a collection of these short recordings, and half of them have already been classified by professional graders based on the CEFR scale (A1, A2, etc.). The task for volunteers was to train a machine learning model that could predict the level of English based on this pre-classified collection. Besides that, Harambee wanted the volunteers to give them an overview of the quality of existing recordings and to come up with a proposal on how to improve the collection of audio data in the future.

The volunteer team tried two approaches to this challenge. In the first one, they transcribed the audio data into texts and applied the NLP algorithm ROBERTA for the classification. For making a classification, they enriched data by adding the information from the CEFR dataset, which was in public use. This approach gave an accuracy of 0.68, which is quite a good score for the first iteration of analysis taking into account the complexity of the data. In the second approach, the team extracted text-independent acoustic features of the recordings and employed signal processing to clean signals. The next step for the team will be the implementation of complex neural nets to predict scores of recordings based on their acoustic features.

It is already the sixth datathon that DSSG Berlin organised, but we are still impressed by the fact that so many people are ready to spend their free time working as a volunteer on quite complex projects. And it is always very exciting to see how many valuable outcomes our volunteers can produce in such a short period of time. A long and detailed discussion of the results between volunteers and NGOs during the presentations on the final day of the datathon shows the importance of such events for both sides and motivates us to organise these events. We are very grateful to all NGOs, volunteers, and data ambassadors, who took part in this event, and we are looking forward to the next edition of the datathon!

Vadim Voskresenskii & Lisa Zäuner (DSSG core team)

--

--