← Back to About Datasets

NIST Baseline Evaluation

Background

This monthly dataset is developed by NIST for screeners to evaluate the efficacy of their screening tools on a full dataset. It consists of approximately 1,000 sequences for screeners to evaluate. The full dataset can be requested through the dashboard, with a new dataset provided monthly.

Dataset Format

The dataset is a downloadable FASTA file with the following:

  • UUID: The universally unique identifier assigned to the sequence, generated from the FASTA sequence and a NIST namespace.
  • Sequence: The nucleotide sequence of the respective entry.

Submission Format

Screeners should submit their results through the “Upload and Evaluate Dataset Results” button on the dashboard and choose “NIST Baseline Evaluation” before uploading their results in CSV or TSV format. Submissions should have a Flag boolean column where positive inputs are 1 and negative inputs are 0.

Evaluation Results

The evaluation returns a Pass or Fail. A passing score is when:

  • Accuracy ≥ 75% ((TP + TN) / Total)
  • Recall ≥ 95% (TP / (TP + FN))

Please note that a passing result does not constitute certification, accreditation, or endorsement by NIST or IBBIS.

  • If you pass: Your passing result can be displayed on the Evaluation Results page (if you’ve opted in). Passing results expire one month from the passing date of your last evaluation.
  • If you fail: Statistics on accuracy and recall are provided.

Sign up to start evaluating