Some patterns

9/12/23

I’m starting to settle into a clean pattern when working with Jupyter notebooks:

import seaborn as sns
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

sns.set(rc={'figure.figsize': (7, 7)}, font_scale=2)
def show_data(_df):
  print(_df.head()):
  return df

# this is a matplotlib axes object
ax = (
    df
    .pivot(index='time', columns='who', values='sentiment')
    .reset_index()
    .pipe(lambda x: sns.scatterplot(x='a', y='v', data=x))
)

(
    ax
    .set(
        title='This is the title!',
        xlabel="X label",
        ylabel="Y label",
        xticklabels=['a', 'b', 'c'],
        yticklabels=[1, 2, 3],
        xlim=(-1, 1),
        ylim=(-1, 1)
    )
)

# Add some flavor
ax.axvline(x=0, color='black');
ax.axhline(y=0, color='black');
plt.text(.07, .1, 'Here is some text!', fontsize = 12);
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b, %y'))

# either or
ax.legend(title='Smoker', loc='upper left', labels=['Yes', 'No']);
plt.legend([], [], frameon=False)

I try hard not to clutter my notebook environment with lots temporary variables, AKA I hate doing things like this:

mydf = pd.read_csv(...)

avg_by_group = mydf.groupby('age')['score'].mean().reset_index()
sns.barplot(..., data=avg_by_group)

I’d much prefer to use .pipe and write something like:

mydf = pd.read_csv(...)

(
  mydf
  .groupby('age')['score'].mean()
  .reset_index()
  .pipe(lambda x: sns.barplot, ..., data=x)
)

If you haven’t used .pipe before you should!! The documentation will do a better job explaining than I am: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pipe.html