Skip to main content
Version: 1.0.1

Noisy

noisy.py adds synthetic noise to datasets. It supports numeric, string, and datetime data and is useful for testing model robustness under data corruption.

Function Signature

from pucktrick.noisy import noisy

error_code, modified_df = noisy(df, strategy)
# or, for mode="extended" / mode="composed":
error_code, modified_df = noisy(df, strategy, original_df=clean_df)

Strategy Parameters

Configure noise behaviour inside perturbate_data:

ParameterValuesDescription
distribution"random"Applies uniform random noise within the column's original range.
distribution"shift"Applies a systematic directional shift. Requires a param dictionary.

Shift Parameters ("distribution": "shift")

When using systematic shifting, provide a param dictionary inside perturbate_data:

"perturbate_data": {
"distribution": "shift",
"param": {
"shift_value": 10,
"shift_unit": "absolute",
"shift_sign": "positive"
}
}
ParameterValuesDescription
shift_valuenumericValue to add (or days for datetime columns).
shift_unit"absolute", "std"Whether shift_value is an absolute amount or a number of standard deviations.
shift_sign"positive", "negative", "random"Direction of the shift.

Example — Random Noise

from pucktrick.noisy import noisy

strategy = {
"affected_features": ["age", "salary"],
"selection_criteria": "all",
"percentage": 0.20,
"mode": "new",
"perturbate_data": {
"distribution": "random",
"sampling": "random"
}
}

err, df_corrupted = noisy(df, strategy)

Example — Systematic Shift

strategy = {
"affected_features": ["sensor_reading"],
"selection_criteria": "all",
"percentage": 0.30,
"mode": "new",
"perturbate_data": {
"distribution": "shift",
"sampling": "random",
"param": {
"shift_value": 2.0,
"shift_unit": "std",
"shift_sign": "positive"
}
}
}

err, df_shifted = noisy(df, strategy)

Supported Column Types

Column typeNoise behaviour
Continuous numericRandom values within column range, or directional shift
Discrete numericRandom integers within column range, or directional shift
Binary (0/1)Bit flips
Categorical stringReplaced with other existing values or random fake strings
Categorical integerReplaced with other existing values
DatetimeDate shifted by shift_value days

Modes

ModeBehaviour
newInjects noise into a clean column up to the specified percentage.
extendedAdds noise to a partially noisy column to reach the cumulative target. Requires original_df.
composedApplies noise only to rows already modified by a previous operator. Requires original_df.