k_split

hhpy.ds.k_split(df: pandas.core.frame.DataFrame, k: int = 5, groupby: Union[Sequence[T_co], str] = None, sortby: Union[Sequence[T_co], str] = None, random_state: int = None, do_print: bool = True, return_type: Union[str, int] = 1) → Union[pandas.core.series.Series, tuple][source]

Splits a DataFrame into k (equal sized) parts that can be used for train test splitting or k_cross splitting

Parameters:
  • df – pandas DataFrame to be split
  • k – how many (equal sized) parts to split the DataFrame into [optional]
  • groupby – passed to pandas.DataFrame.groupby before splitting, ensures that each group will be represented equally in each split part [optional]
  • sortby – if True the DataFrame is ordered by these column(s) and then sliced into parts from the top if False the DataFrame is sorted randomly before slicing [optional]
  • random_state – random_state to be used in random sorting, ignore if sortby is True [optional]
  • do_print – whether to print steps to console [optional]
  • return_type – if one of [‘Series’, ‘s’] returns a pandas Series containing the k indices range(k) if a positive integer < k returns tuple of shape (df_train, df_test) where the return_type’th part is equal to df_test and the other parts are equal to df_train
Returns:

depending on return_type either a pandas Series or a tuple