Published Images and Packages

Published Images and Packages#

Within Cal-ITP, we publish several Python packages and Docker images that are then used to underpin other work. Changes to these packages and images are deployed via CI/CD processes that run automatically when new code is merged to the relevant Cal-ITP repository. CI/CD processes for Python packages run a test upload to PyPI upon opening or modifying a pull request. CI/CD processes for images build the relevant image upon opening or modifying a pull request, but do not push a new image tag to GHCR until changed are merged into main.

Some images and packages manage dependencies via traditional requirements.txt files, and some manage dependencies via Poetry pyproject.toml files. Please refer to Poetry documentation for successful management of pyproject dependencies.

READMEs describing the individual testing and publication process for each image and package are linked in the below table. A detailed guide for updating the calitp-data-analysis package is available here, written for an analyst audience.

Name

Function

Source Code

README

Publication URL

Type

calitp-data-analysis

Shared tools to ease common data analysis tasks within the Cal-ITP ecosystem

cal-itp/data-infra

cal-itp/data-infra

https://test.pypi.org/project/calitp-data-analysis

Python Package

calitp-data-infra

Shared imports and tools used by data infrastructure and data pipelines within the Cal-ITP ecosystem

cal-itp/data-infra

cal-itp/data-infra

https://test.pypi.org/project/calitp-data-infra

Python Package

dask

Parallelization infrastructure used by JupyerHub

cal-itp/data-infra

cal-itp/data-infra

https://ghcr.io/cal-itp/data-infra/dask

Docker Image

gtfs-aggregator-scraper

A tool to scrape various GTFS feed aggregators and save the present URLs in GCS

cal-itp/data-infra

cal-itp/data-infra

https://ghcr.io/cal-itp/data-infra/gtfs-aggregator-scraper

Docker Image

gtfs-rt-archiver-v3

Underpins our GTFS-RT archiver service, allowing us to save fast-moving GTFS-RT data

cal-itp/data-infra

cal-itp/data-infra

https://ghcr.io/cal-itp/data-infra/gtfs-rt-archiver-v3

Docker Image

gtfs-rt-parser-v2

Code to parse GTFS RT protobufs into newline-delimited JSON for querying via BigQuery external tables

cal-itp/data-infra

cal-itp/data-infra

https://ghcr.io/cal-itp/data-infra/gtfs-rt-parser-v2

Docker Image

gtfs-schedule-validator

Wrapper for the MobilityData GTFS Schedule validator (so we can choose to use the correct version for the age of a given data import)

cal-itp/data-infra

cal-itp/data-infra

https://ghcr.io/cal-itp/data-infra/gtfs-schedule-validator

Docker Image

jupyter-singleuser

Shared, consistent tooling for individual local Jupyter notebook users and JupyterHub

cal-itp/data-infra

cal-itp/data-infra

https://ghcr.io/cal-itp/data-infra/jupyter-singleuser

Docker Image