Sample Space
Sample Space
probabl
Sample space is a podcast about tools, thoughts and techniques from machine learning practitioners. We talk to toolmakers and practitioners about interesting problems in the real world.
Pragmatic data science checklists with Peter Bull - cofounder Drivendata
A lot of things can (and have) gone wrong when folks tried to apply data science projects. So how might we prevent that? Maybe what we need to do is to look at the medical profession and their practice of checklists before surgery.
Jul 17, 2024
1 hr 5 min
Model safety, that's a pickle! with Adrin Jalali - scikit-learn maintainer
Historically it's always been the case that you would use a pickle file to store a trained scikit-learn model on disk for deployment. Pickles make sense because these are so flexible, but they do carry a security concern. Adrin has been working on a remedy called skops, which is the main topic of this podcast. To learn more about skops, make sure to check the documentation: https://skops.readthedocs.io/en/stable/
Jun 27, 2024
1 hr 1 min
Moving Towards KDearestNeighbors with Leland McInnes - creator of UMAP
Leland McInnes is known for a lot of packages. There's UMAP, but also PyNNDescent and HDBScan. Recently he's also been working on tools to help visualise clusters of data and he's also cooking up something new that's related to nearest neighbor algorithms. This interview touches all of these topics.If you're interested in learning more about the MoMA exhibition, it was by Refik Anadol: https://refikanadol.com/ and this was the work at MoMA: https://refikanadol.com/works/unsupervised/.The other artist was Kyle McDonald: https://kylemcdonald.net/ and the piece we mentioned was this one: https://www.youtube.com/watch?v=04DqdT0-NtI.
May 30, 2024
57 min
Talk like a DataFrame, run like SQL with Phillip Cloud - core-committer on Ibis
Ibis is a Python library that offers a single data-frame API, from Python, which can run your queries on many different backends. These include databases like Postgres, but also commercial vendors like BigQuery and Snowflake. This ability to control multiple backends from a single API has a lot of use-cases, as well as maintainer challenges, all of which are discussed in this episode. To learn more about Ibis, check out the docs here: https://ibis-project.org/ If you're attending PyCon US this year, you may be interested in Philip's talk: https://us.pycon.org/2024/schedule/presentation/55/ During the podcast, Philip also mentioned a blogpost about DuckDB, here: https://ibis-project.org/posts/why-duckdb/ There was also a dogfooding blogpost, which is this one: https://ibis-project.org/posts/ci-analysis/
May 2, 2024
1 hr 4 min
Enhancing Jupyter with Widgets with Trevor Manz - creator of anywidget.
In this (first!) episode of Sample Space we talk to Trevor Mantz, the creator of anywidget. It's a (neat!) tool to help you build more interactive notebooks by giving you tools to apply just enough Javascript to get directional communication working in your favorite notebook environment. That means that Python can talk to widgets, but also that widgets can talk to Python. There's a lot to like about these widgets and we're doing a proper deep dive in this first episode.To learn more about anywidget, check out the docs. In particular you may want to glance at the gallery first, it has loads of nice examples.You can also find the project on Github and if you're eager to talk to folks involved with the project, consider joining the discord here.
Apr 11, 2024
1 hr 11 min
Introducing Sample Space
We're starting a new podcast!
Apr 3, 2024
1 min