Julien Le Dem

Software Architect, Entrepreneur, Open Source leader


Julien Le Dem

     

Bio

I am a software architect, open source leader and entrepreneur who loves collaborating with others in Open Source projects. I started the Parquet project in collaboration with the Impala team at Cloudera back when I was at Twitter. I chaired the project for many years at the Apache foundation and Parquet is now the de-facto standard for data lakes. I later contributed to the creation of the Arrow project as a founding engineer at Dremio. Before that I received my initiation contributing to OpenSource in the Apache Pig project where I evolved from contributor to committer to PMC member and eventually chaired the project in 2013. More recently I started the OpenLineage project while being the CTO and co-founder of Datakin which was later acquired by Astronomer. OpenLineage came out of Marquez, the project we co-created at Wework on the data platform team.

I blog at Sympathetic.Ink

Projects

  • Apache Parquet: co-creator, PMC menber, PMC chair 2015-2021
  • Apache Arrow: co-creator and PMC member
  • Apache Pig: PMC member, PMC chair 2013
  • Apache Iceberg: PMC member
  • OpenLineage: creator and project lead at the LFAI&Data foundation
  • Marquez: co-creator, former project lead at the LFAI&Data
  • Brenus: a java bytecode generation library

Podcasts

It Depends

  • Celebrating 10 years of Apache Parquet with Julien Le Dem and Nong Li; Ep. 34 April 2023

DC_THURS

  • Data Lineage w/ Julien Le Dem (Datakin)

Data Driven NYC

  • Data Observability and Pipelines: OpenLineage and Marquez

Data engineering podcast

The Analytics Engineering Roundup

Software Engineering daily

Presentations

Over the years I gave a number of talks. You’ll find them in chronological order on the presentations page. You’ll also find a playlist of talks recordings on Youtube.

Nurturing Open Source communities

  • Data Council 2024: Ten+ years of building open source standards.
  • SBTB 2023: Ten years of building open source standards.
  • Data Council 2023: Ten years of building open source standards: From Parquet to Arrow to OpenLineage
  • Airflow Summit 2023: Nurturing an Open Source Community is Like Tending a Garden
  • Subsurface 2023: Ten years of building open source standards: From Parquet to Arrow to OpenLineage

Open Data Lineage: OpenLineage, Marquez

  • Data and AI summit 2023: Cross-Platform Data Lineage with OpenLineage
  • Berlin Buzzwords 2022: Cross-Platform Data Lineage with OpenLineage

Data Architecture

Columnar formats: Parquet, Arrow

  • Data Works Summit 2018: The columnar roadmap, Apache Parquet and Apache Arrow
  • NABD Conference 2017: The future of column-oriented data processing with Arrow and Parquet

Embedding Pig in scripting languages