Did you know that, on a rough estimation, only 6% of open source contributors were women?! This is awfully low. The scikit-learn team really cares about improving its diversity, gender being one of our focus, we decided to partner with Women in Machine Learning and Data Science Paris (WiMLDS Paris) to help there. On March 12th, on a Saturday morning, we joined for our sprint at CybelAngel! It’s been a long time since we organized a face-to-face event, especially a sprint!

What is a scikit-learn sprint you may ask? The scikit-learn sprint is a hands-on “hackathon” where we work on issues in the scikit-learn GitHub repository and learn to contribute to open source. This sprint included an introductory and practical workshop about contribution to open source software.

Under the guidance of no less than five people from the scikit-learn team, the participants set up their environments, learned how to use tools ranging from conda to git, black or pytest, and finally made their first pull requests!

If you don’t feel like solving issues and submitting pull request, another way to contribute to scikit-learn as a user is simply to open issues when you run into one! By the way here is a good way of doing so using a minimal reproducer. Note that there are many other ways to contribute to scikit-learn, organizing such an event being one. Feel free to contact me if you would like to do so.

For a full replay of the event, you can check Chloé Azencott’s twitter page and the #pariswimlds hashtag: she documented every step, from the tee-shirt distribution in the morning to the successful pull requests, without forgetting learning how to use VS Code and of course… the ☕ and 🍕 but there was also fruits 🥥🥝, tea and fruit juice, we were not here to comform to stereotypes.

A couple of numbers from this sprint:

  • 33 pull requests
  • 30 of them (when these lines are written) have been accepted already and passed all the CI tests
  • 17 women participants
  • 100% happiness and pride!

If you want to do this at home, here are a couple of links to be guided and starting to contribute to scikit learn:

All the setup and guidelines were explained in a specific github repository that you can find here. It was made to be crystal clear, and guided step to step: you can rely on it if you want to start. During this sprint, a couple of issues, and more exactly meta-issues (which are issues listing a problem in plenty of different places, to be fixed individually), were listed in a specific board. Although we made some progress, they are still open if you want to have a look! The maintainers and core contributors of scikit learn label some issues as “good first issue”. It’s a label to encourage people to step in with these easier issues #21350 #22406. We also worked on #11000 which is labeled as “hard” but some participants successfully tackled it.
If you are interested, scikit-learn also has a webpage explaining how to contribute.

Some participants were interviewed during the sprint…

… And you can find their reactions to this sprint here.

We would like to thank our mentors Olivier Grisel, Adrin Jalali, Maren Westermann, Béa Hernandez, and Gaël Varoquaux. Thank you for coming, sometimes from quite far, for being pedagogical, reviewing and accepting their pull requests so quickly!

The organization of this sprint was made possible by WiMLDS Paris and especially Chloé Azencott who was instrumental, making the connection between the participants and us, the scikit-learn community team.

Finally, thank you to CybelAngel for hosting us, Giulia Bianchi for presenting CybelAngel and Marie Sacksick for the perfect logistic of this event.