Helpful Links#
Here are some resources data analysts have collected and referenced, that will hopefully help you out in your work. Have something you want to share? Create a new markdown file, add it to the example report folder, and message Amanda.
Data Analysis#
Python#
Pandas#
Summarizing#
Merging#
When working with data sets where the “merge on” column is a string data type, it can be difficult to get the DataFrames to join. For example, df1 lists County of Sonoma, Human Services Department, Adult and Aging Division, but df2 references the same department as: County of Sonoma (Human Services Department) .
Potential Solution #1: fill in a column in one DataFrame that has a partial match with the string values in another one.
Potential Solution #2: use the package fuzzymatcher. This will require you to carefully comb through for any bad matches.
Potential Solution #3: if you don’t have too many values, use a dictionary.
Dates#
df['n_days_between'] = (df['prepared_date'] - df.shift(1)['prepared_date']).dt.days
# Make sure your column is a date time object
df['financial_year'] = df['base_date'].map(lambda x: x.year if x.month > 3 else x.year-1)
Monetary Values#
x=alt.X("Funding Amount", axis=alt.Axis(format="$.2s", title="Obligated Funding ($2021)"))
Adjust for inflation.
# Must install and import cpi package for the function to work.
def adjust_prices(df):
cols = ["total_requested",
"fed_requested",
"ac_requested"]
def inflation_table(base_year):
cpi.update()
series_df = cpi.series.get(area="U.S. city average").to_dataframe()
inflation_df = (series_df[series_df.year >= 2008]
.pivot_table(index='year', values='value', aggfunc='mean')
.reset_index()
)
denominator = inflation_df.value.loc[inflation_df.year==base_year].iloc[0]
inflation_df = inflation_df.assign(
inflation = inflation_df.value.divide(denominator)
)
return inflation_df
##get cpi table
cpi = inflation_table(2021)
cpi.update
cpi = (cpi>>select(_.year, _.value))
cpi_dict = dict(zip(cpi['year'], cpi['value']))
for col in cols:
multiplier = df["prepared_y"].map(cpi_dict)
##using 270.97 for 2021 dollars
df[f"adjusted_{col}"] = ((df[col] * 270.97) / multiplier)
return df
Visualization#
Charts#
Altair#
Manually concatenate a bar chart and line chart to create a dual axis graph.
Resolving the error ‘TypeError: Object of type ‘Timestamp’ is not JSON serializable’
Add tooltip to chart functions.
def add_tooltip(chart, tooltip1, tooltip2):
chart = (
chart.encode(tooltip= [tooltip1,tooltip2]))
return chart
Maps#
DataFrames#
ipywidgets#
Tabs#
Create tabs to switch between different views.