Question: Variant calling with GATK on human cancer samples aligned against GRCh38.p10
gravatar for mihai72
2.6 years ago by
mihai7210 wrote:

I'm doing a college project where I need to call variants on some human cancer samples. I've chosen to align the reads to the GRCh38.p10 assembly, but now I'm having a hard time finding the appropriate 1000 Genomes Indels VCF files to run BaseRecalibrator and downstream commands.

The latest GATK bundle seems to have vcfs for hg38, but I presume all the chromosome names are in the UCSC format, and not compatible with my GRCh38 assembly.

My question is can I use the Mills gold standard indels for build GRCh37, or does somebody know where I can find the latest ones for the GRCh38 assembly?

Also, the study that my samples are derived from used the Agilent SureSelect Human All Exon v4 to do exome sequencing, and I read on the GATK website that I should use the -L tag with a custom bed file when running BaseRecalibrator for exome sequencing data. Does anyone have the corresponding bed file? I tried going to Agilent's eArray site but it appears to be down for me.

If not, I used Pierre Lindenbaum's command in this post to generate my own bed file, but it didn't work for me since it was based on hg38 and all the chromosomes are named incorrectly. I can fix that with a simple text replacement script, but I was wondering if the hg38 exome coordinates will be the same as the ones for the GRCh38 assembly.

Thanks for your help.

snp alignment next-gen gatk genome • 1.2k views
ADD COMMENTlink written 2.6 years ago by mihai7210

You may consider re-aligning to GRCh37 / hg19 just for convenience. It takes time for resources to update after a new genome build release, even though hg38 has been out for some considerable time.

Your question will also most likely get a better response on the GATK forum itself.

Finally, you may consider a non-GATK somatic variant caller.

ADD REPLYlink written 2.6 years ago by Kevin Blighe66k

Has anyone managed to find a GRCh38 version of 1000G and Hills & 1000G indels?

ADD REPLYlink written 21 months ago by Roman Luštrik100

Direct link to mills hg38: Roman Luštrik

ADD REPLYlink written 21 months ago by cpad011214k

How would one reconcile UCSC/ENSEMBL difference in annotation for this file? E.g. hg38 has chromosomes named "chr1" while GRCh38 has "1". There are probably differences in other contigs as well?

ADD REPLYlink modified 21 months ago • written 21 months ago by Roman Luštrik100

Ensembl's 1000 Genomes data for g38 can be found here: [courtesy Emily_Ensembl]

ADD REPLYlink written 21 months ago by Kevin Blighe66k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 814 users visited in the last hour