This is the repository for our reproduction of the paper "Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations".
We implement and test the following experiments:
- Noise hypothesis testing (section 4.2. in the original publication). With this experiment, the authors claim that real-world noisy labels differ significantly from their synthetic counterparts. We confirm their claim, albeit with slightly different results.
Transition vectors within real-world (top) and synthetic (bottom) clusters for Random1 label
- Noise memorization effects (section 5.2. in the original publication). The authors claim that models start to overfit on the real-world noisy labels faster than on their synthetic counterparts, indicating a harder learning task. We confirm their claim.
Noise memorisation effects.
- Benchmark of LNL approaches (section 5.1. in the original publication). The authors claim that LNL approaches perform worse on real-world noisy labels than on their synthetic counterparts. We confirm their claim, but not with their supposed hyperparameter setup.
| Method | Clean | Aggregate | Random 1 | Random 2 | Random 3 | Worst | Clean | Noisy |
|---|---|---|---|---|---|---|---|---|
| CE | ||||||||
| Co-teaching | ||||||||
| Co-teaching+ | ||||||||
| ELR | ||||||||
| ELR+ | ||||||||
| DivideMix | ||||||||
| VolMinNet | ||||||||
| CAL | ||||||||
| PES (semi) | ||||||||
| SOP+ | ||||||||
| SOP |
Our results on the real-world noise (CIFAR-10N and CIFAR-100n)
conda create -n noisylables python=3.10
conda activate noisylabels
pip install .pip install .[reproducibility] # for optional reproducibility dependenciesTo rerun our experiments, refer to src/reproducibility folder:
- To rerun the hypothesis testing experiment, run the
hypothesis_testing.ipynbnotebook within thenoise_hypothesis_testingfolder. - To rerun noise memorisation experiment, run the
memo.pyscript andmemorization.ipynbnotebook withinmemorization_effectsfolder. - To rerun the benchmark experiments, run
main_paper.py(authors' claimed hyperparameter setup with CIFAR-10)main.py(original hyperparameter setup with CIFAR-10)main_cifar100n.py(original hyperparameters with CIFAR-100) - To run our version of the benchmark run
benchmark.py\benchmark_cifar100.pyscripts.
figures- Figures of the main experiments.src- Everything codenoisypy- Python package for working with LNL methods.reproducibility- Experiments for reproducibility challenge submission.learning_strategies- Experiments to verify our implementations for each LNL method.logs- Results of our experiments.memorization_effects- Memorization effects experiment - seememorization.ipynbandmemo.py.noise_hypothesis_testing- Noise clustering hypothesis testing experiment - seehypothesis_testing.ipynb.noisy_labels- Benchmark reproduction.main.pyis for normal CIFAR-10N human and synthetic runs (--syntheticswitch to run synthetic versions).main_cifar100n.pysame thing for CIFAR-100N.main_paper.pyruns CIFAR-10N authors claimed configs (with fixed learning rate, schedulers and optimizers) andbenchmark.pyfor our version of the benchmark.
tests- Unit tests for some of the package modules.
If you find our work useful, please consider citing it as follows:
@article{hudovernik_2026_18401497,
author = {Valter Hudovernik and Žiga Rot and Klemen Vovk and
Luka Škodnik and Luka Čehovin Zajc},
title = {Learning with Noisy Labels [~Re]visited},
journal = {ReScience C},
year = 2026,
volume = 11,
number = 1,
month = jan,
doi = {10.5281/zenodo.18401497},
url = {https://doi.org/10.5281/zenodo.18401497},
}