Models

class hhpy.modelling.Models(*args, df: pandas.core.frame.DataFrame = None, X_ref: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, y_ref: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, groupby: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, scaler_X: Any = None, scaler_y: Any = None, printf: Callable = <function tprint>)[source]

Bases: hhpy.main.BaseClass

Collection of Models that allow for fitting and predicting with multiple Models at once, comparing accuracy and creating Ensembles

Parameters:
  • args – multiple Model objects that will form a Models Collection
  • name – name of the collection
  • df – Pandas DataFrame containing the training and testing data. Can be saved to the Model object or supplied on an as needed basis.
  • X_ref – List of features (predictors) used for training the model
  • y_ref – List of labels (targets) to be predicted
  • scaler_X – Scalar object that implements .transform and .inverse_transform, applied to the features (predictors)before training and inversely after predicting [optional]
  • scaler_y – Scalar object that implements .transform and .inverse_transform, applied to the labels (targets)before training and inversely after predicting [optional]
  • printf – print function to use for logging [optional]

Methods Summary

fit(fit_type, k_test, groupby, int, float, …) fit all Model objects in collection
k_split(**kwargs) apply hhpy.ds.k_split to self to create train-test or k-cross ready data
model_by_name(name, str]) extract a list of Models from the collection by their names
predict(X, numpy.ndarray, Sequence[T_co], …) predict with all models in collection
score(return_type, pivot, groupby, int, …) calculate score of the Model predictions
scoreplot([x, y, hue, hue_order, row, …]) plot the score(s) using sns.barplot
train(df, k, groupby, str] = None, sortby, …) wrapper method that combined k_split, train, predict and score

Methods Documentation

fit(fit_type: str = 'train_test', k_test: Optional[int] = 0, groupby: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, do_print: bool = True, **kwargs)[source]

fit all Model objects in collection

Parameters:
  • fit_type – one of [‘train_test’, ‘k_cross’, ‘final’]
  • k_test – which k_index to use as test data
  • groupby – The columns used for grouping, passed to pandas.DataFrame.groupby [optional]
  • do_print – Whether to print the steps to console [optional]
  • kwargs – Other keyword arguments passed to fit()
Returns:

None

k_split(**kwargs)[source]

apply hhpy.ds.k_split to self to create train-test or k-cross ready data

Parameters:kwargs – keyword arguments passed to k_split()
Returns:None
model_by_name(name: Union[list, str]) → Union[hhpy.modelling.Model, list][source]

extract a list of Models from the collection by their names

Parameters:name – name of the Model
Returns:list of Models
predict(X: Union[pandas.core.frame.DataFrame, numpy.ndarray, Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, y: Union[pandas.core.frame.DataFrame, numpy.ndarray, Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, df: pandas.core.frame.DataFrame = None, return_type: str = 'self', ensemble: bool = False, k_predict_type: str = 'test', groupby: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, multi: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, do_print: bool = True) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame, None][source]

predict with all models in collection

Parameters:
  • X – The feature (predictor) data used for predicting as DataFrame, np.array or column names
  • y – The label (target) data used for predicting as DataFrame, np.array or column names. Specifying y is only necessary for convolutional or time-series type models [optional]
  • df – Pandas DataFrame containing the predict data, optional if array like data is passed for X_predict
  • return_type – one of [‘y’, ‘df’, ‘DataFrame’, ‘self’]
  • ensemble – if True also predict with Ensemble like combinations of models. If True or mean calculatemean of individual predictions. If median calculate median of individual predictions. [optional]
  • k_predict_type – ‘test’ or ‘all’
  • groupby – The columns used for grouping, passed to pandas.DataFrame.groupby [optional]
  • multi – postfixes to use for multi output [optional]
  • do_print – Whether to print the steps to console [optional]
Returns:

if return_type is self: None, else see Model.predict

score(return_type: str = 'self', pivot: bool = False, groupby: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, do_print: bool = True, display_score: bool = True, display_format: str = ',.3f', **kwargs) → Optional[pandas.core.frame.DataFrame][source]

calculate score of the Model predictions

Parameters:
  • return_type – one of [‘self’, ‘df’, ‘DataFrame’]
  • pivot – whether to pivot the DataFrame for easier readability [optional]
  • do_print – Whether to print the steps to console [optional]
  • display_score – Whether to display the score DataFrame [optional]
  • display_format – Format to use when displaying the score DataFrame [optional]
  • groupby – The columns used for grouping, passed to pandas.DataFrame.groupby [optional]
  • kwargs – other keyword arguments passed to df_score()
Returns:

if return_type is ‘self’: None, else: pandas DataFrame containing the scores

scoreplot(x='y_ref', y='value', hue='model', hue_order=None, row='score', row_order=None, palette=None, width=16, height=4.5, scale=None, query=None, return_fig_ax=False, **kwargs) → Optional[tuple][source]

plot the score(s) using sns.barplot

Parameters:
  • x – Name of the x variable in data or vector data
  • y – Name of the y variable in data or vector data
  • hue – Further split the plot by the levels of this variable [optional]
  • hue_order – Either a string describing how the (hue) levels or to be ordered or an explicit list of levels to be used for plotting. Accepted strings are:
  • sorted: following python standard sorting conventions (alphabetical for string, ascending for value)
  • inv: following python standard sorting conventions but in inverse order
  • count: sorted by value counts
  • mean, mean_ascending, mean_descending: sorted by mean value, defaults to descending
  • median, mean_ascending, median_descending: sorted by median value, defaults to descending
Parameters:
  • row – the variable to wrap around the rows [optional]
  • row_order – Either a string describing how the (hue) levels or to be ordered or an explicit list of levels to be used for plotting. Accepted strings are:
  • sorted: following python standard sorting conventions (alphabetical for string, ascending for value)
  • inv: following python standard sorting conventions but in inverse order
  • count: sorted by value counts
  • mean, mean_ascending, mean_descending: sorted by mean value, defaults to descending
  • median, mean_ascending, median_descending: sorted by median value, defaults to descending
Parameters:
  • palette – Collection of colors to be used for plotting. Can be a dictionary for with names for each level or a list of colors or an individual color name. Must be valid colors known to pyplot [optional]
  • width – Width of each individual subplot [optional]
  • height – Height of each individual subplot [optional]
  • scale – scale the values [optional
  • query – query to be passed to pd.DataFrame.query before plotting [optional]
  • return_fig_ax – Whether to return the figure and axes objects as tuple to be captured as fig,ax = …, If False pyplot.show() is called and the plot returns None [optional]
  • kwargs – other keyword arguments passed to sns.barplot
Returns:

see return_fig_ax

train(df: pandas.core.frame.DataFrame = None, k: int = 5, groupby: Union[Sequence[T_co], str] = None, sortby: Union[Sequence[T_co], str] = None, random_state: int = None, fit_type: str = 'train_test', k_test: Optional[int] = 0, ensemble: bool = False, scores: Union[Sequence[T_co], str, Callable] = None, multi: Union[Sequence[T_co], int, float, str, bytes, None, AbstractSet[T_co]] = None, scale: float = None, do_predict: bool = True, do_score: bool = True, do_split: bool = True, do_fit: bool = True, do_print: bool = True, display_score: bool = True) → None[source]

wrapper method that combined k_split, train, predict and score

Parameters:
  • df – Pandas DataFrame containing the training and testing data. Can be saved to the Model object or supplied on an as needed basis.
  • k – see hhpy.ds.k_split see hhpy.ds.k_split
  • groupby – The columns used for grouping, passed to pandas.DataFrame.groupby [optional]
  • sortby – see hhpy.ds.k_split
  • random_state – see hhpy.ds.k_split
  • fit_type – see .fit
  • k_test – see .fit
  • ensemble – if True also predict with Ensemble like combinations of models. If True or mean calculatemean of individual predictions. If median calculate median of individual predictions. [optional]
  • scores – see .score [optional]
  • multi – postfixes to use for multi output [optional]
  • scale – see .score
  • do_print – Whether to print the steps to console [optional]
  • display_score – Whether to display the score DataFrame [optional]
  • do_split – whether to apply k_split [optional]
  • do_fit – whether to fit the Models [optional]
  • do_predict – whether to add predictions to DataFrame [optional]
  • do_score – whether to create self.df_score [optional]
Returns:

None