k_split¶
-
hhpy.ds.
k_split
(df: pandas.core.frame.DataFrame, k: int = 5, groupby: Union[Sequence[T_co], str] = None, sortby: Union[Sequence[T_co], str] = None, random_state: int = None, do_print: bool = True, return_type: Union[str, int] = 1) → Union[pandas.core.series.Series, tuple][source]¶ Splits a DataFrame into k (equal sized) parts that can be used for train test splitting or k_cross splitting
Parameters: - df – pandas DataFrame to be split
- k – how many (equal sized) parts to split the DataFrame into [optional]
- groupby – passed to pandas.DataFrame.groupby before splitting, ensures that each group will be represented equally in each split part [optional]
- sortby – if True the DataFrame is ordered by these column(s) and then sliced into parts from the top if False the DataFrame is sorted randomly before slicing [optional]
- random_state – random_state to be used in random sorting, ignore if sortby is True [optional]
- do_print – whether to print steps to console [optional]
- return_type – if one of [‘Series’, ‘s’] returns a pandas Series containing the k indices range(k) if a positive integer < k returns tuple of shape (df_train, df_test) where the return_type’th part is equal to df_test and the other parts are equal to df_train
Returns: depending on return_type either a pandas Series or a tuple