From the discussion during the technical committee, the scikit-learn Consortium at Inria defined the following list of priorities for the coming year:

  • Improve documentation with extra examples and topic-based discussions:
  • Operationalization of models / MLOps
    • Other packages to use
    • Good practices (we are an entry point of the community)
      • Model serialization options (pickle vs skops vs ONNX)
      • Version control for reproducible retraining
      • Automate (CI / CD)
      • Long blocks of code should be in importable Python modules with tests, not in notebooks
    • Add side boxes in our doc / examples on the production logic (it may differ from exploration mode)
    • [Idea to be explored] Declarative construction of pipelines (pros / cons)
  • Annotation regarding models’ hyperparameter: https://github.com/scikit-learn/scikit-learn/pull/17929
    • First step implemented via programmatic hyperparameter declaration.
  • Improving performance and scalability
  • DOC and tools: safer recommendation for the right metrics for a given y_test.
  • Improve support for quantification of uncertainties in predictions and calibration measures
  • Improve the default solver in linear models:
  • Investigate the possibility to automatically use a data-derived preconditioner for the solver in particular for smooth solvers in LogisticRegression / PoissonRegression, etc:
    https://github.com/scikit-learn/scikit-learn/pull/15583
  • Re-evaluate the choice of the default value of max_iter and tol if necessary.
  • More flexible support for alternative input data container types:
  • Quantification of fairness issues and potential mitigation
    • Document fairness assessment metrics
  • Developer API: Making it easier for 3rd party developers by separating out non-user facing API that is not private (tested + backward compatibility)
  • Tutorial / guide on various strategies to assess certainty in predictions: impact of the choice of the loss function (e.g. mse, pinball, poisson), stability to resampling (e.g. using the Bagging meta-estimators), Gaussian process regression with covariance estimation and points to other external resources (e.g. conformal predictions, explicit bayesian posterior modeling…)
  • Programmatically defining good value / starting points for hyperparameter grids:
  • Consider whether survival analysis and training models on censored data should be tackled in scikit-learn:
    • Organize a workshop to move understanding of the current ecosystem.
    • Document that this problem exists in the documentation
    • Maybe: an example to educate on the biases introduced by censoring and point to proper tools, such as the lifelines and scikit-survival projects.
    • Consider contributing to the wider ecosystem on survival or make an example in scikit-learn using the “Poisson trick”
  • Programmatic way to specify hyperparameter search without param name string mangling with `__`
  • Callback and logging (interruption) monitor and checkpoint fitting loop. One application would be to better integrate with (internal and external) hyper-parameter search strategies that can leverage checkpointing to make model selection more resource efficient.
    • This is can be useful for teaching and general UX (progress bars that work on parallel sub-tasks even in notebook)
    • This can be important for MLOps (monitoring, snapshotting model for inspection, etc.)
    • It can help us gain agility during development (write better tests for the impact of convergence criteria, easier convergence debugging)
    • Related PR: https://github.com/scikit-learn/scikit-learn/pull/16925
      New prototype in
  • Consider allowing users to pass custom loss functions, in particular for Histogram Gradient-Boosting (maybe without guarantees on backward compat).

 

Longer term: Big picture tasks which require more thinking

  • MLOps: Model auditing and data auditing
  • Improve UX via HTML repr for model with diagnostics with recorded fit-time warnings.
  • Model auditing tools that output HTML reprs.
  • Talk to various people to understand their needs and practices
  • Recommendation for statistical tests for distribution drift
  • Connect with skops for Model cards generation / documentation / template: https://github.com/skops-dev/skops
  • Survival analysis tools need to go beyond point wise predictions and this might  be more generally useful in scikit-learn, possibly uncertainty quantification in predictions.

Explore API to simplify data wrangling (outside of scikit-learn)

 

Community: On the community side

  • Continue regular technical sprints and topic focused workshops (possibly by inviting past sprint contributors to try to foster a long term relationship and hopefully recruit new maintainers).
    • Better preparation for issues
    • Plan with greater advance
    • Fewer people in the sprints (to be able to provide better mentoring)
  • Make the consortium meetings more transparent and inclusive:
    • Invite Adrin and other advisory board people to the meetings
    • Make the weekly tasking more visible
  • Renew the organization of beginners’ workshops for prospective contributors, probably before a sprint.
  • Organize a workshop on statistical topics (causal inference and calibration) and possibly followed by 2 days of sprint
  • Organize a workshop on our software-engineering practices, some ideas of topics:
    • CI and CD practices, e.g.:
      • optional testing on float32 and robin round seed setting
      • nightly build and version pinning rationales
    • local development practices, e.g.:
      • pre-commit config
    • code review guidelines
    • performance troubleshooting and improvements
      • profiling, benchmarking
  • Conduct a new edition of its 2013 survey among all scikit-learn users.