Published Images and Packages

Published Images and Packages#

Within Cal-ITP, we publish several Python packages and Docker images that are then used to underpin other work. Changes to these packages and images are deployed via CI/CD processes that run automatically when new code is merged to the relevant Cal-ITP repository. CI/CD processes for Python packages run a test upload to PyPI upon opening or modifying a pull request. CI/CD processes for images build the relevant image upon opening or modifying a pull request, but do not push a new image tag to GHCR until changed are merged into main.

Images and packages manage dependencies via pyproject.toml files. The jupyter-singleuser image uses uv for dependency management; analysts’ data science dependencies are managed in the data-analyses repo as a uv workspace and installed at runtime via uv sync.

READMEs describing the individual testing and publication process for each image and package are linked in the below table. A detailed guide for updating the calitp-data-analysis package is available here, written for an analyst audience.

Name

Function

Source Code

README

Publication URL

Type

calitp-data-analysis

Shared tools to ease common data analysis tasks within the Cal-ITP ecosystem. Now lives in data-analyses as a uv workspace member.

cal-itp/data-analyses

cal-itp/data-analyses

https://pypi.org/project/calitp-data-analysis (frozen)

Python Package

calitp-data-infra

Shared imports and tools used by data infrastructure and data pipelines within the Cal-ITP ecosystem

cal-itp/data-infra

cal-itp/data-infra

https://test.pypi.org/project/calitp-data-infra

Python Package

dask

Parallelization infrastructure used by JupyerHub

cal-itp/data-infra

cal-itp/data-infra

https://ghcr.io/cal-itp/data-infra/dask

Docker Image

gtfs-rt-archiver-v3

Underpins our GTFS-RT archiver service, allowing us to save fast-moving GTFS-RT data

cal-itp/data-infra

cal-itp/data-infra

https://ghcr.io/cal-itp/data-infra/gtfs-rt-archiver-v3

Docker Image

gtfs-schedule-validator

Wrapper for the MobilityData GTFS Schedule validator (so we can choose to use the correct version for the age of a given data import)

cal-itp/data-infra

cal-itp/data-infra

https://ghcr.io/cal-itp/data-infra/gtfs-schedule-validator

Docker Image

jupyter-singleuser

Shared, consistent tooling for individual local Jupyter notebook users and JupyterHub

cal-itp/data-infra

cal-itp/data-infra

https://ghcr.io/cal-itp/data-infra/jupyter-singleuser

Docker Image