Question: Mismatch in exome capture and mapping reference assembly in TCGA WXS data ?
gravatar for subhajit06
3.6 years ago by
United States
subhajit06110 wrote:

Hi all,

I have one question.

We wanted to call Copy number for some of TCGA WXS samples. But we noticed that the exome capture kit (target bed file) that was used, was based on hg18 (more specifically is was hg18 nimblegen exome version 2). In fact we went and verified that this bed file coordinates are actually from hg18.

Now from the bam file header it seems, it was mapped against GrCh37-lite which is a version of hg19.

So for most Exome CNV caller they need the capture target bed file as one of the inputs. But in this case it is not consistent with the reference genome version used in mapping. I think that'd be a problem.

So what would be the best way to call the copy number in this situation. Should we lift over the coordinate from the target bed file to GrCh37-lite ?



ADD COMMENTlink modified 3.6 years ago by Cyriac Kandoth5.4k • written 3.6 years ago by subhajit06110
gravatar for Raony Guimarães
3.6 years ago by
Dublin / Ireland
Raony Guimarães1.1k wrote:

Yes, The fact that your targets where designed to hg18 might be a problem, so you could also try to map the reads against hg18 just for comparison.

You could also try to lift-over the exome targets and stick to the new positions of the assembly (hg19).

You should stick to the alignment that has the biggest number of reads aligned against the reference genome (hg18 or hg19). My bet is that you will have a better alignment against hg18.

ADD COMMENTlink written 3.6 years ago by Raony Guimarães1.1k
gravatar for Cyriac Kandoth
3.6 years ago by
Cyriac Kandoth5.4k
Memorial Sloan Kettering, New York, USA
Cyriac Kandoth5.4k wrote:

Your proposed solution is appropriate - use the GRCh37 BAM files, and a corresponding GRCh37 target BED file. GRCh38 would be even better, but let's not get too carried away. ;)

At least for important cancer genes, the exon sequences haven't changed much between hg18 and hg19. Moreover, you are likely to find more differences between an actual tumor sample's exome and the hg18 exome, than between hg18 and hg19 exomes. So the hybridization efficiency of the capture kit is not severely affected because it was designed on the hg18 exome. But re-aligning those captured reads to GRCh37 (hg19) brings it closer to the haplotypes of the average human, than hg18 does.

Note that hg19 is a version of GRCh37. Here's the quick history - A: Human dna reference file with no prefix 'chr'

"hg18 nimblegen exome version 2" sounds like what WashU used for TCGA LAML, OV, BRCA, and UCEC. Their official hg18 BED file lives here, but you can fetch Nimblegen's official hg19 BED and GFF from here. Their file named SeqCap_EZ_Exome_v2.bed will contain 2 tracks named "Target Regions" and "NimbleGen Tiled Regions". For copy-number calling, you'll need only the "Target Regions", and also remove regions in chrX, chrY, chrM, and any unaligned contigs... because CN calling on those is tricky. Plus, removing those makes it easier to convert hg19 to GRCh37.

ADD COMMENTlink written 3.6 years ago by Cyriac Kandoth5.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1056 users visited in the last hour