Readmore
Readmore
Readmore
arrows
arrows
arrows
Watchvideo
Visitblog
Visitservices
Visitwork
Engineering

Using Marquez as a lineage tool for Celery

To track task relationships and enhance data lineage visualization, our martech-automation engineer, Marin, recently explored integrating Celery with Marquez using the OpenLineage Python package.

Marin Aglić Čuvić 3 months ago
dots dots

Our martech-automation engineer, Marin, recently explored integrating Celery with Marquez using the OpenLineage Python package. This integration aims to track task relationships and enhance data lineage visualization. As Celery workflows resemble data pipelines, the team identified a need for a tool like Marquez to display task relationships and data flow.


So a thought occurred — could they use Marquez to display the execution of Celery tasks? 


Why this integration matters

Marquez, an open-source tool for metadata tracking, provides visibility into data movement across various systems. By integrating it with Celery, Marin explored how to send execution data during specific points of task execution.


Current Status and Insights

While still in testing, Marin has outlined key insights, such as the initial setup to send updates on task statuses at defined times. Find more information and examples in his blog series:


A fun experiment: using Marquez as a lineage tool for Celery

This first article introduces the concept of data lineage using Marquez with Celery workflows.

It explains how to track tasks through events like start, success, and failure. The article also includes setup guidance using Docker Compose, a custom task class, and tips for visualizing tasks in Marquez’s UI. 


Using Marquez as a lineage tool for Celery — adding the parent-run facet

The second article  focuses on capturing parent-child task relationships using the ParentRunFacet.


It covers the implementation for sending  job details, sending additional metadata using facets, handling race conditions when accessing the Marquez API, and caching task information using Redis to avoid delays.It also provides code examples for implementing these features, including methods for storing and retrieving parent job details to ensure accurate lineage tracking in complex Celery workflows.

Next steps

Our teams are actively testing this integration, and we’ll share more insights once the testing phase concludes.

Stay tuned! 😎

Newsletter

We’ll need your email If you want to hear our two cents on the industry’s latest. We’re certain it will be worth your time, well, at least worth more than two cents.