Introduction to JupyterHub

Jupyterhub is a web application that allows users to analyze and create reports on warehouse data (or a number of data sources).

Analyses on JupyterHub are accomplished using notebooks, which allow users to mix narrative with analysis code.

You can access JuypterHub using this link: notebooks.calitp.org.

Using JupyterHub

For Python users, we have deployed a cloud-based instance of JupyterHub to make creating, using, and sharing notebooks easy.

This avoids the need to set up a local environment, provides dedicated storage, and allows you to push to GitHub.

Logging in to JupyterHub

JupyterHub currently lives at notebooks.calitp.org.

Note: you will need to have been added to the Cal-ITP organization on GitHub to obtain access. If you have yet to be added to the organization and need to be, ask in the #services-team channel in Slack.

Connecting to the Warehouse

Connecting to the warehouse requires a bit of setup after logging in to JupyterHub, but allows users to query data in the warehouse directly. To do this, you will need to download and install the gcloud commandline tool from the app.

See the screencast below for a full walkthrough.

The commands required:

# init will both authenticate and do basic configuration
# You do not have to set a default compute region/zone
gcloud init

# Optionally, you can auth and set the project separately
gcloud auth login
gcloud config set project cal-itp-data-infra

# Regardless, set up application default credentials
gcloud auth application-default login

If you are still not able to connect, make sure you have the suite of permissions associated with other analysts.

Increasing the Query Limit

By default, there is a query limit set within the Jupyter Notebook. Most queries should be within that limit, and running into DatabaseError: 500 Query exceeded limit for bytes billed should be a red flag to investigate whether such a large query is needed for the analysis. To increase the query limit, add and execute the following in your notebook:

from calitp_data_analysis.tables import tbls

import os
os.environ["CALITP_BQ_MAX_BYTES"] = str(20_000_000_000)


Querying with SQL in JupyterHub

JupyterHub makes it easy to query SQL in the notebooks.

To query SQL, simply import the below at the top of your notebook:

import calitp_data_analysis.magics

And add the following to the top of any cell block that you would like to query SQL in:



import calitp_data_analysis.magics

FROM `mart_gtfs.dim_schedule_feeds`
    key = "db58891de4281f965b4e7745675415ab"

Saving Code to Github

Use this link to navigate to the Saving Code section of the docs to learn how to commit code to GitHub from the Jupyter terminal. Once there, you will need to complete the instructions in the following sections:

Environment Variables

Sometimes if data access is expensive, or if there is sensitive data, then accessing it will require some sort of credentials (which may take the form of passwords or tokens).

There is a fundamental tension between data access restrictions and analysis reproducibility. If credentials are required, then an analysis is not reproducible out-of-the-box. However, including these credentials in scripts and notebooks is a security risk.

Most projects should store the authentication credentials in environment variables, which can then be read by scripts and notebooks. The environment variables that are required for an analysis to work should be clearly documented.

Analysts should store their credentials in a _env file, a slight variation of the typical .env file, since the .env won’t show up in the JupyterHub filesystem.

Some credentials that need to be stored within the _env file may include GitHub API key, Census API key, Airtable API key, etc. Store them in this format:


To pass these credentials in a Jupyter Notebook:

import dotenv
import os

# Load the env file

# Import the credential (without exposing the password!)

Jupyter Notebook Best Practices

External resources:

Developing warehouse models in JupyterHub

See the warehouse README for warehouse project setup instructions.