What is the difference between Hg38gatkbundle and hg38? Will my output differ, if I use either of these as a reference source for whole exome, whole genome, or targeted resequencing? If so, what kind of differences can I expect?
What is the difference between Hg38gatkbundle and hg38? Will my output differ, if I use either of these as a reference source for whole exome, whole genome, or targeted resequencing? If so, what kind of differences can I expect?
There are many versions of hg38 and how the differences manifest themselves will depend on your processing pipeline. The "Hg38gatkbundle" version probably refers to the one from the GATK bundle, but "hg38" can be any of the hg38 variants. There is not a definite version that it always corresponds to. There is a really nice summary of what the different options are as well as what their benefits and drawbacks are available here: http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use
The bundle comprises hg38 and additional files required for the GATK best practice guideline. The reference genome itself is the same.
Also, with respect to above reference sequence, I used gatkbundlehg38 and hg38.fa for alignment of exome sequence. I am getting error while using samtools reheader.
my command is: path/to/samtools reheader -i HG100.sam > HG100.bam
I got following error both the times. [E::hts_open] fail to open file '-i' [main_reheader] fail to read the header from -i.
HG100.sam is 4.9GB
Following shows it's header details: @SQ SN:chr1 LN:248956422 @SQ SN:chr2 LN:242193529 @SQ SN:chr3 LN:198295559 @SQ SN:chr4 LN:190214555 @SQ SN:chr5 LN:181538259 @SQ SN:chr6 LN:170805979 @SQ SN:chr7 LN:159345973 @SQ SN:chr8 LN:145138636 @SQ SN:chr9 LN:138394717 @SQ SN:chr10 LN:133797422
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
So tat means, hg38 need to be indexed for the alignment while Hg38gatkbundle need not as it has all the necessary files for alignment.