Getting Notebooks Ready for the Portfolio

Getting Notebooks Ready for the Portfolio#

Headers#

Parameterized Titles#

If you’re parameterizing the notebook, the first Markdown cell must include parameters to inject.
- Ex: If district is one of the parameters in your sites/my_report.yml, a header Markdown cell could be # District {district} Analysis.
- Note: The site URL is constructed from the original notebook name and the parameter in the JupyterBook build: 0_notebook_name__district_x_analysis.html

Consecutive Headers#

Headers must move consecutively in Markdown cells. No skipping!

# Notebook Title
## First Section
## Second Section
### Another subheading

To get around consecutive headers, you can use display(HTML()).

display(HTML(<h1>First Header</h1>) display(HTML(<h3>Next Header</h3>))

Capturing Parameters#

If you’re using a heading, you can either use HTML or capture the parameter and inject.

HTML - this option works when you run your notebook locally.

from IPython.display import HTML

display(HTML(f"<h3>Header with {variable}</h3>"))

Capture parameters - this option won’t display locally in your notebook (it will still show {district_number}), but will be injected with the value when the JupyterBook is built.

In a code cell:
```
%%capture_parameters

district_number = f"{df.caltrans_district.iloc[0].split('-')[0].strip()}"
```
In a Markdown cell:
```
## District {district_number}
```

Suppress Warnings#

Suppress warnings from displaying in the portfolio site (shared_utils).

# Include this in the cell where packages are imported

%%capture
import warnings
warnings.filterwarnings('ignore')

Narrative#

Narrative content can be done in Markdown cells or code cells.
- Markdown cells should be used when there are no variables to inject.
- Code cells should be used to write narrative whenever variables constructed from f-strings are used.
For papermill, add a parameters tag to the code cell Note: Our portfolio uses a custom papermill engine and we can skip this step.
Markdown cells can inject f-strings if it’s plain Markdown (not a heading) using display(Markdown()) in a code cell.

from IPython.display import Markdown

display(Markdown(f"The value of {variable} is {value}."))

Use f-strings to fill in variables and values instead of hard-coding them
- Turn anything that runs in a loop or relies on a function into a variable.
- Use functions to grab those values for a specific entity (operator, district), rather than hard-coding the values into the narrative.

n_routes = (df[df.calitp_itp_id == itp_id]
            .route_id.nunique()
            )


n_parallel = (df[
            (df.calitp_itp_id == itp_id) &
            (df.parallel==1)]
            .route_id.nunique()
            )

display(
    Markdown(
        f"**Bus routes in service: {n_routes}**"
        "<br>**Parallel routes** to State Highway Network (SHN): "
        f"**{n_parallel} routes**"
        )
)

Stay away from loops if you need to use headers.
- You will need to create Markdown cells for headers or else JupyterBook will not build correctly. For parameterized notebooks, this is an acceptable trade-off.
- For unparameterized notebooks, you may want use display(HTML()).
- Caveat: Using display(HTML()) means you’ll lose the table of contents navigation in the top right corner in the JupyterBook build.

Writing Guide#

These are a set of principles to adhere to when writing the narrative content in a Jupyter Notebook. Use your best judgment to decide when there are exceptions to these principles.

Decimals less than 1, always prefix with a 0, for readability.
- 0.05, not .05
Integers when referencing dates, times, etc
- 2020 for year, not 2020.0 (coerce to int64 or Int64 in pandas; Int64 are nullable integers, which allow for NaNs to appear alongside integers)
- 1 hr 20 min, not 1.33 hr (use best judgment to decide what’s easier for readers to interpret)
Round at the end of the analysis. Use best judgment to decide on significant digits.
- Too many decimal places give an air of precision that may not be present.
- Too few decimal places may not give enough detail to distinguish between categories or ranges.
- A good rule of thumb is to start with 1 extra decimal place than what is present in the other columns when deriving statistics (averages, percentiles), and decide from there if you want to round up.
  - An average of $100,000.0 can simply be rounded to $100,000.
  - An average of 5.2 mi might be left as is.
- National Institutes of Health Rounding Rules (full article)
Additional references: American Psychological Association (APA) style, and Purdue

Standard Names#

GTFS data in our warehouse stores information on operators, routes, and stops.
Analysts should reference the operator name, route name, and Caltrans district the same way across analyses.
- ITP ID: 182 is Metro (not LA Metro, Los Angeles County Metropolitan Transportation Authority, though those are all correct names for the operator)
- Caltrans District: 7 is 07 - Los Angeles
- Between route_short_name, route_long_name, route_desc, which one should be used to describe route_id? Use shared_utils.portfolio_utils, which relies on regular expressions, to select the most human-readable route name.

Before deploying your portfolio, make sure the operator name you’re using is what’s used in other analyses in the portfolio.

Use shared_utils.portfolio_utils to help you grab the right names to use.

from shared_utils import portfolio_utils

route_names = portfolio_utils.add_route_name()

# Merge in the selected route name using route_id
df = pd.merge(df,
            route_names,
            on = ["calitp_itp_id", "route_id"]
)


agency_names = portfolio_utils.add_agency_name()

# Merge in the operator's name using calitp_itp_id
df = pd.merge(df,
            agency_names,
            on = "calitp_itp_id"
)