Some patterns
9/12/23
I’m starting to settle into a clean pattern when working with Jupyter notebooks:
import seaborn as sns
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
sns.set(rc={'figure.figsize': (7, 7)}, font_scale=2)
def show_data(_df):
print(_df.head()):
return df
# this is a matplotlib axes object
ax = (
df
.pivot(index='time', columns='who', values='sentiment')
.reset_index()
.pipe(lambda x: sns.scatterplot(x='a', y='v', data=x))
)
(
ax
.set(
title='This is the title!',
xlabel="X label",
ylabel="Y label",
xticklabels=['a', 'b', 'c'],
yticklabels=[1, 2, 3],
xlim=(-1, 1),
ylim=(-1, 1)
)
)
# Add some flavor
ax.axvline(x=0, color='black');
ax.axhline(y=0, color='black');
plt.text(.07, .1, 'Here is some text!', fontsize = 12);
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b, %y'))
# either or
ax.legend(title='Smoker', loc='upper left', labels=['Yes', 'No']);
plt.legend([], [], frameon=False)
I try hard not to clutter my notebook environment with lots temporary variables, AKA I hate doing things like this:
mydf = pd.read_csv(...)
avg_by_group = mydf.groupby('age')['score'].mean().reset_index()
sns.barplot(..., data=avg_by_group)
I’d much prefer to use .pipe
and write something like:
mydf = pd.read_csv(...)
(
mydf
.groupby('age')['score'].mean()
.reset_index()
.pipe(lambda x: sns.barplot, ..., data=x)
)
If you haven’t used .pipe
before you should!! The documentation will do a better job explaining than I am: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pipe.html