Question: GiaB SV Calls for NIST:HG001/NA12878: What's the reference build?
3
gravatar for QVINTVS_FABIVS_MAXIMVS
2.6 years ago by
USA SoCal
QVINTVS_FABIVS_MAXIMVS2.3k wrote:

You would think NIST/GiaB would explicitly state the reference build for their putative gold standard SV calls. But I can't see it anywhere

I'm assuming it's in GRCh37???

ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/technical/svclassify_Manuscript/Supplementary_Information/Personalis_1000_Genomes_deduplicated_deletions.bed

Anyone else knows? I can't find any information in READMEs or in the published svclassify paper.

deletion cnv nist giab • 1.4k views
ADD COMMENTlink modified 6 months ago by wtwhite10 • written 2.6 years ago by QVINTVS_FABIVS_MAXIMVS2.3k
0
gravatar for Kevin Blighe
6 months ago by
Kevin Blighe49k
Kevin Blighe49k wrote:

If you are referring to this (svclassify: a method to establish benchmark structural variant calls), then —yes— they aligned to NCBI's GRCh37 reference genome:

...raw reads were mapped to the National Center for Biotechnology Information (NCBI) build 37 using the Burrows-Wheeler Aligner (BWA) “bwa mem” v.0.7.5a with default parameters

Variants were mapped to human reference coordinates (NCBI build 37) by walking the read overlap graph in both directions until an “anchor” read, where a continuous 65 bps matches the reference, denoted the beginning and end of each variant.

If you are referring to the original published works (Extensive sequencing of seven human genomes to characterize benchmark reference materials), then, the same:

The sequencing data were aligned by bwa mem6 against b37 human decoy reference genome.

ADD COMMENTlink written 6 months ago by Kevin Blighe49k
0
gravatar for wtwhite
6 months ago by
wtwhite10
wtwhite10 wrote:

It looks like it's GRCh37: On p. 11 of Parikh et al. (2014) in the first paragraph of "Methods", they say that they mapped the Platinum Genomes 2x100bp HiSeq data to NCBI "build 37" using bwa mem v.0.7.5a with default parameters, and that aligned (meaning, presumably, to the same reference) BAM files were publicly available for the Illumina 250bp, PacBio and Moleculo data. Also near the top of the next page they write that the Spiral Genetics variants in category C were mapped to NCBI build 37, though it's not yet clear to me how the subheading this falls under corresponds to the 11 rows of Table 2.

ADD COMMENTlink written 6 months ago by wtwhite10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1356 users visited in the last hour