Our martech-automation engineer, Marin, recently explored integrating Celery with Marquez using the OpenLineage Python package. This integration aims to track task relationships and enhance data lineage visualization. As Celery workflows resemble data pipelines, the team identified a need for a tool like Marquez to display task relationships and data flow.
So a thought occurred — could they use Marquez to display the execution of Celery tasks?
Why this integration matters
Marquez, an open-source tool for metadata tracking, provides visibility into data movement across various systems. By integrating it with Celery, Marin explored how to send execution data during specific points of task execution.
Current Status and Insights
While still in testing, Marin has outlined key insights, such as the initial setup to send updates on task statuses at defined times. Find more information and examples in his blog series:
A fun experiment: using Marquez as a lineage tool for Celery
This first article introduces the concept of data lineage using Marquez with Celery workflows.
It explains how to track tasks through events like start, success, and failure. The article also includes setup guidance using Docker Compose, a custom task class, and tips for visualizing tasks in Marquez’s UI.
Using Marquez as a lineage tool for Celery — adding the parent-run facet
The second article focuses on capturing parent-child task relationships using the ParentRunFacet.
It covers the implementation for sending job details, sending additional metadata using facets, handling race conditions when accessing the Marquez API, and caching task information using Redis to avoid delays.It also provides code examples for implementing these features, including methods for storing and retrieving parent job details to ensure accurate lineage tracking in complex Celery workflows.
Next steps
Our teams are actively testing this integration, and we’ll share more insights once the testing phase concludes.
Stay tuned! 😎