outlier_to_nan

hhpy.ds.outlier_to_nan(df: pandas.core.frame.DataFrame, col: str, groupby: Union[list, str] = None, std_cutoff: numpy.number = 3, reps: int = 1, do_print: bool = False) → pandas.core.frame.DataFrame[source]

this algorithm cuts off all points whose DELTA (avg diff to the prev and next point) is outside of the n std range

Parameters:
  • df – pandas DataFrame
  • col – column to be filtered
  • groupby – if provided: applies std filter by group
  • std_cutoff – the number of standard deviations outside of which to set values to None
  • reps – how many times to repeat the algorithm
  • do_print – whether to print steps to console
Returns:

pandas Series with outliers set to nan