Or: if you can’t do the time, don’t do the crime
Several days ago, we released a technical report entitled Benchmarking Crimes: An Emerging Threat in Systems Security. The paper was intended for publication at a security conference but was rejected at multiple venues. To let our work be a supporting piece of evidence and analysis for the community to build on, we share our work with the community as a technical report, and we publish it on Arxiv.org.
The results are as revealing as they are damning: we formulate 22 different benchmarking crimes, each of which violates the results of a benchmark in a minor or major fashion. We survey 50 different systems security defense papers. We include papers published by this group in that selection. To gauge reliability, the survey is performed twice – we let two independent readers perform this survey. Their findings are consistent: in this wide study of accepted papers at top systems security venues, all papers had committed benchmarking crimes in some number and degree of egregiousness.
Most of these are recent papers (2015), but a significant fraction are from 2010. This longitudinal component of the study tells us that not only are benchmarking crimes widespread, but also no better in modern papers than in older ones.
This raises the question of how we can trust benchmarks in research results. We hope our work will contribute to an improvement in this situation.
The Register has coverage.