Model

class hhpy.modelling.Model(model: Any = None, name: str = 'pred', X_ref: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, y_ref: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, groupby: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None)[source]

Bases: hhpy.main.BaseClass

A unified modeling class that is extended from sklearn, accepts any model that implements .fit and .predict

Parameters:
  • model – Any model object that implements .fit and .predict
  • name – Name of the model, used for naming columns [optional]
  • X_ref – List of features (predictors) used for training the model
  • y_ref – List of labels (targets) to be predicted
  • groupby – The columns used for grouping, passed to pandas.DataFrame.groupby [optional]

Methods Summary

fit(X, numpy.ndarray, Sequence[T_co], int, …) generalized fit method extending on model.fit
predict(X, numpy.ndarray, Sequence[T_co], …) Generalized predict method based on model.predict

Methods Documentation

fit(X: Union[pandas.core.frame.DataFrame, numpy.ndarray, Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, y: Union[pandas.core.frame.DataFrame, numpy.ndarray, Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, df: pandas.core.frame.DataFrame = None, dropna: bool = True, X_test: Union[pandas.core.frame.DataFrame, numpy.ndarray, Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, y_test: Union[pandas.core.frame.DataFrame, numpy.ndarray, Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, df_test: pandas.core.frame.DataFrame = None, groupby: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, k: int = 0) → None[source]

generalized fit method extending on model.fit

Parameters:
  • X – The feature (predictor) data used for training as DataFrame, np.array or column names
  • y – The label (target) data used for training as DataFrame, np.array or column names
  • df – Pandas DataFrame containing the training data, optional if array like data is passed for X/y
  • dropna – Whether to drop rows containing NA in the training data [optional]
  • X_test – The feature (predictor) data used for testing as DataFrame, np.array or column names
  • y_test – The label (target) data used for testing as DataFrame, np.array or column names
  • df_test – Pandas DataFrame containing the testing data, optional if array like data is passed for X/y test
  • groupby – The columns used for grouping, passed to pandas.DataFrame.groupby [optional]
  • k – index of the model to fit
Returns:

None

predict(X: Union[pandas.core.frame.DataFrame, numpy.ndarray, Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, y: Union[pandas.core.frame.DataFrame, numpy.ndarray, Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, df: pandas.core.frame.DataFrame = None, return_type: str = 'y', k_index: pandas.core.series.Series = None, groupby: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, handle_na: bool = True, multi: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame][source]

Generalized predict method based on model.predict

Parameters:
  • X – The feature (predictor) data used for training as DataFrame, np.array or column names
  • y – The label (target) data used for training as DataFrame, np.array or column names
  • df – Pandas DataFrame containing the training and testing data. Can be saved to the Model object or supplied on an as needed basis.
  • return_type – one of [‘y’, ‘df’, ‘DataFrame’], if ‘y’ returns a pandas Series / DataFrame with only the predictions, if one of ‘df’,’DataFrame’ returns the full DataFrame with predictions added
  • k_index – If specified and model is k_cross split: return only the predictions for each test subset
  • groupby – The columns used for grouping, passed to pandas.DataFrame.groupby [optional]
  • handle_na – Whether to handle NaN values (prediction will be NaN) [optional]
  • multi – Postfixes to use for multi output models [optional]
Returns:

see return_type