Skip to main content
Version: 1.0.1

Labels

labels.py flips classification labels to simulate mislabeled data. It supports binary and multi-class classification tasks and is useful for evaluating robustness under annotation errors or adversarial mislabeling.

Function Signature

from pucktrick.labels import labels

error_code, modified_df = labels(df, strategy)
# or, for mode="extended" / mode="composed":
error_code, modified_df = labels(df, strategy, original_df=clean_df)

Strategy Parameters

For multi-class label noise, configure perturbate_data with a noise_model key:

noise_modelDescription
"NCAR" (Noise Completely At Random)Uniform random label flip, independent of class or features.
"NAR" (Noise At Random)Class-dependent flip. Provide a flip_distribution in param.
"NNAR" (Nearest Neighbor At Random)Flips labels of instances close to decision boundaries. Provide features_for_similarity in param.

For binary targets no noise_model is needed — labels are flipped 0 ↔ 1.

NCAR Example

strategy = {
"affected_features": ["label"],
"selection_criteria": "all",
"percentage": 0.15,
"mode": "new",
"perturbate_data": {
"sampling": "random",
"distribution": "random",
"noise_model": "NCAR"
}
}

NAR Example

strategy = {
"affected_features": ["label"],
"selection_criteria": "all",
"percentage": 0.15,
"mode": "new",
"perturbate_data": {
"sampling": "random",
"distribution": "random",
"noise_model": "NAR",
"param": {
"flip_distribution": {"0": [0.1, 0.9], "1": [0.8, 0.2], "2": [0.5, 0.3, 0.2]}
}
}
}

NNAR Example

strategy = {
"affected_features": ["label"],
"selection_criteria": "all",
"percentage": 0.15,
"mode": "new",
"perturbate_data": {
"sampling": "random",
"distribution": "random",
"noise_model": "NNAR",
"param": {
"features_for_similarity": ["feature1", "feature2", "feature3"]
}
}
}

Modes

ModeBehaviour
newInjects label noise into a clean dataset up to the specified percentage.
extendedAdds more label flips to a dataset that may already contain mislabeled rows, reaching the cumulative target. Requires original_df.
composedFlips labels only in rows already modified by a previous operator. Requires original_df.