Data Versioning for Data Science from Data Science Deployed

Data Versioning for Data Science

51 minutes Posted Oct 20, 2021 at 12:49 pm.

0:00

51:16

Share at current time

Show notes

Today we talk about Data Versioning. Why you should do it, what to do about humans in the loop, and how to minimize mistakes.

Tools mentioned:

DVC - https://dvc.org/

Quilt Data Versioning - https://quiltdata.com/

Apache Airflow - https://airflow.apache.org/

Apache Superset - https://superset.apache.org/

OpenProject - https://www.openproject.org/

----------------------------------------

Follow the podcast on Twitter: @dsdeployed

https://twitter.com/dsdeployed

----------------------------------------

Donny Winston

I help researchers do data-intensive science together.

Twitter: https://twitter.com/donnywinston @donnywinston

Website: https://polyneme.xyz/

LinkedIn: https://www.linkedin.com/in/donnywinston/

Ben Cook

I help data science teams deploy their algorithms because a machine learning model is only as good as the system that delivers it.

Twitter: @jbencook https://twitter.com/jbencook

LinkedIn: https://www.linkedin.com/in/jbencook/

Website: https://sparrow.dev/

Jillian Rowe

I help biotech startups deploy scalable high performance compute infrastructure on AWS.

Website: https://www.dabbleofdevops.com

Twitter: www.twitter.com/jillianerowe

LinkedIn: https://www.linkedin.com/in/jillian-rowe-9410437a/