Using Marquez as a lineage tool for Celery

Our martech-automation engineer, Marin, recently explored integrating Celery with Marquez using the OpenLineage Python package. This integration aims to track task relationships and enhance data lineage visualization. As Celery workflows resemble data pipelines, the team identified a need for a tool like Marquez to display task relationships and data flow.

So a thought occurred — could they use Marquez to display the execution of Celery tasks?

Why this integration matters

Marquez, an open-source tool for metadata tracking, provides visibility into data movement across various systems. By integrating it with Celery, Marin explored how to send execution data during specific points of task execution.

Current Status and Insights

While still in testing, Marin has outlined key insights, such as the initial setup to send updates on task statuses at defined times. Find more information and examples in his blog series:

A fun experiment: using Marquez as a lineage tool for Celery

This first article introduces the concept of data lineage using Marquez with Celery workflows.

It explains how to track tasks through events like start, success, and failure. The article also includes setup guidance using Docker Compose, a custom task class, and tips for visualizing tasks in Marquez’s UI.

Using Marquez as a lineage tool for Celery — adding the parent-run facet

The second article focuses on capturing parent-child task relationships using the ParentRunFacet.

It covers the implementation for sending job details, sending additional metadata using facets, handling race conditions when accessing the Marquez API, and caching task information using Redis to avoid delays.It also provides code examples for implementing these features, including methods for storing and retrieving parent job details to ensure accurate lineage tracking in complex Celery workflows.

Next steps

Our teams are actively testing this integration, and we’ll share more insights once the testing phase concludes.

Stay tuned! 😎

Why this integration matters

Current Status and Insights

Next steps

If you liked this one…

Optimising Style for Speed: Our Journey from styled-components to Tailwind CSS

Batching through BigQuery data from Python

12345678? It’s Time to Say Goodbye to Passwords Forever

Newsletter