Multi-label Data Sets

This sub-module provides loading of data sets and down sampling of the label space.

skml.datasets.load_dataset(name)[source]

Loads a multi-label classification dataset.

Parameters:
name : string

Name of the dataset. Currently only ‘yeast’ is available.

skml.datasets.sample_down_label_space(y, k, method='most-frequent')[source]

Samples down label space, such that the returned label space retains order of the original labels, but removes labels which do not meet certain criteria (see method).

Parameters:
y : (sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]

Multi-label targets

k : number

Number of returned labels, has to be smaller than the number of distinct labels in y

method : string, default = ‘most-frequent’

Method to sample the label space down. Currently supported is only by top k most frequent labels.