distplot

hhpy.plotting.distplot(x: Union[Sequence[T_co], str], data: pandas.core.frame.DataFrame = None, hue: str = None, hue_order: Union[Sequence[T_co], str] = 'sorted', palette: Union[Mapping[KT, VT_co], Sequence[T_co], str] = None, linecolor: str = 'black', edgecolor: str = 'black', alpha: float = None, bins: Union[Sequence[T_co], int] = 40, perc: bool = None, top_nr: int = None, other_name: str = 'other', title: bool = True, title_prefix: str = '', std_cutoff: float = None, hist: bool = None, distfit: Union[str, bool, None] = 'kde', fill: bool = True, legend: bool = True, legend_loc: str = None, legend_space: float = 0.1, legend_ncol: int = 1, agg_func: str = 'mean', number_format: str = '.2f', kde_steps: int = 1000, max_n: int = 100000, random_state: int = None, sample_warn: bool = True, xlim: Sequence[T_co] = None, linestyle: str = None, label_style: str = 'mu_sigma', x_offset_perc: float = 0.025, ax: matplotlib.axes._axes.Axes = None, **kwargs) → matplotlib.axes._axes.Axes[source]

Similar to seaborn.distplot but supports hues and some other things. Plots a combination of a histogram and a kernel density estimation.

Parameters:
  • x – the name of the variable(s) in data or vector data, if data is provided and x is a list of columns the DataFrame is automatically melted and the newly generated column used as hue. i.e. you plot the distributions of multiple columns on the same axis
  • data – Pandas DataFrame containing named data, optional if vector data is used
  • hue – Further split the plot by the levels of this variable [optional]
  • hue_order

    Either a string describing how the (hue) levels or to be ordered or an explicit list of levels to be used for plotting. Accepted strings are:

    • sorted: following python standard sorting conventions (alphabetical for string, ascending for value)
    • inv: following python standard sorting conventions but in inverse order
    • count: sorted by value counts
    • mean, mean_ascending, mean_descending: sorted by mean value, defaults to descending
    • median, mean_ascending, median_descending: sorted by median value, defaults to descending
  • palette – Collection of colors to be used for plotting. Can be a dictionary for with names for each level or a list of colors or an individual color name. Must be valid colors known to pyplot [optional]
  • linecolor – Color of the kde fit line, overwritten with palette by hue level if hue is specified [optional]
  • edgecolor – Color of the histogram edges [optional]
  • alpha – Alpha transparency level [optional]
  • bins – Nr of bins of the histogram [optional]
  • perc – Whether to display the y-axes as percentage, if false count is displayed. Defaults if hue: True, else False [optional]
  • top_nr – limit hue to top_nr levels using hhpy.ds.top_n, the rest will be cast to other [optional]
  • other_name – name of the other group created by hhpy.ds.top_n [optional]
  • title – whether to set the plot title equal to x’s name [optional]
  • title_prefix – prefix to be used in plot title [optional]
  • std_cutoff – automatically cutoff data outside of the std_cutoff standard deviations range, by default this is off but a recommended value for a good visual experience without outliers is 3 [optional]
  • hist – whether to show the histogram, default False if hue else True [optional]
  • distfit – one of [‘kde’, ‘gauss’, ‘False’, ‘None’]. If ‘kde’ fits a kernel density distribution to the data. If gauss fits a gaussian distribution with the observed mean and std to the data. [optional]
  • fill – whether to fill the area under the distfit curve, ignored if hist is True [optional]
  • legend – Whether to show a legend [optional]
  • legend_loc – Location of the legend, one of [bottom, right] or accepted value of pyplot.legendIf in [bottom, right] legend_outside is used, else pyplot.legend [optional]
  • legend_space – Only valid if legend_loc is bottom. The space between the main plot and the legend [optional]
  • legend_ncol – Number of columns to use in legend [optional]
  • agg_func – one of [‘mean’, ‘median’]. The agg function used to find the center of the distribution [optional]
  • number_format – The format string used for annotations [optional]
  • kde_steps – Nr of steps the range is split into for kde fitting [optional]
  • max_n – Maximum number of samples to be used for plotting, if this number is exceeded max_n samples are drawn ‘ ‘at random from the data which triggers a warning unless sample_warn is set to False. ‘ ‘Set to False or None to use all samples for plotting. [optional]
  • random_state – Random state (seed) used for drawing the random samples [optional]
  • sample_warn – Whether to trigger a warning if the data has more samples than max_n [optional]
  • xlim – X limits for the axis as tuple, passed to ax.set_xlim() [optional]
  • linestyle – Linestyle used, must a valid linestyle recognized by pyplot.plot [optional]
  • label_style – one of [‘mu_sigma’, ‘plain’]. If mu_sigma then the mean (or median) and std value are displayed inside the label [optional]
  • x_offset_perc – the amount whitespace to display next to x_min and x_max in percent of x_range [optional]
  • ax – The matplotlib.pyplot.Axes object to plot on, defaults to current axis [optional]
  • kwargs – additional keyword arguments passed to pyplot.plot
Returns:

The matplotlib.pyplot.Axes object with the plot on it