dataengineeringpodcast.com
Data Lineage For Your Pipelines
An interview about how the open source Pachdyerm platform makes building flexible data pipelines with first class support for data lineage easy Some problems in data are well defined and benefit from a ready-made set of tools. For everything else, there's Pachyderm, the platform for data science that is built to scale. In this episode Joe Doliner, CEO and co-founder, explains how Pachyderm started as an attempt to make data provenance easier to track, how the platform is architected and used today, and examples of how the underlying principles manifest in the workflows of data engineers and data scientists as they collaborate on data projects. In addition to all of that he also shares his thoughts on their recent round of fund-raising and where the future will take them. If you are looking for a set of tools for building your data science workflows then Pachyderm is a solid choice, featuring data versioning, first class tracking of data lineage, and language agnostic data pipelines.