Custom logging in python

9/1/19

  When working with data, there are a lot of things I need to keep track of in my head.

  In order to keep track of these questions when working in a jupyter notebook I end up having tons of cells that look like this:

df.head()

or

df.shape

  By using decorators and the .pipe method I can develop an analysis path that will give me customized output and automate this tedious cycle of .head() and .shape. Let’s take a look.

import pandas as pd
import numpy as np
import functools

np.random.seed(5)
df = pd.DataFrame({
    'group':np.random.choice(['a', 'b', 'c'], 10),
    'x':np.random.randint(0, 10, 10),
    'y':np.random.normal(0, 10, 10)
}); df.head()

group x y
0 c 0 9.118736
1 b 7 -14.438416
2 c 1 18.244402
3 c 5 14.576251
4 a 7 -9.102582

Now I’ll define some processing functions.

  These functions all take the dataframe as an argument and pass the dataframe back. A few notes:

def pDoc(func):
    """Print the docstring of a function."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        rv = func(*args, **kwargs)
        print("{}(): \n\t{} -> {}".format(func.__name__, func.__doc__, rv.shape))
        return rv
    return wrapper

@pDoc
def startPipe(df):
    """Begin pipeline"""
    return df

@pDoc
def filterGroups(df):
    """Remove group b from the analysis."""
    return df.query('group != "b"')

@pDoc
def capVal(df):
    """Cap the value of y at 10."""
    dat = df.copy()
    dat['y'] = dat['y'].apply(lambda x: 10 if x > 10 else x)
    return dat

@pDoc
def getMean(df):
    """Add column as mean value of x by group."""
    dat = df.copy()
    dat['g_mean'] = dat.groupby('group')['x'].transform(np.mean)
    return dat

  Now I’ll tie all these functions together using .pipe.

(df
    .pipe(startPipe)
    .pipe(filterGroups)
    .pipe(getMean)
    .pipe(capVal)).head()
startPipe(): 
	Begin pipeline -> (10, 3)
filterGroups(): 
	Remove group b from the analysis. -> (8, 3)
getMean(): 
	Add column as mean value of x by group. -> (8, 4)
capVal(): 
	Cap the value of y at 10. -> (8, 4)

group x y g_mean
0 c 0 9.118736 3.0
2 c 1 10.000000 3.0
3 c 5 10.000000 3.0
4 a 7 -9.102582 3.5
6 a 1 -8.175481 3.5

  As you can see, I get a really nice log output that shows the function name, docstring, and the shape of its output. I like this solution because it automates the really tedious process of having to ask myself “how many records did I just throw out”. By using decorators, the function will always show me the shape of the output.

  Also, this solution can be really easily extended / modified. Don’t like what my pDoc decorator is doing? It’s really easy to change and customize. You’re really only limited by your imagination (and python).