Human WGS :which genome reference
0
0
Entering edit mode
3.2 years ago
quentin54520 ▴ 120

Hello all,

Sorry if the question has been asked before ...

I have to do human genome alignments. I would like to use the grch38 version but I don't know which version exactly to use. On the NCBI site there are several versions:

  • Full analysis set
  • Full plus
  • no alt analysis
  • no alt plus

From what i understood the aligners like bwa mem are alt awared so i think that i could used the full analysis or full analysis plus but i want to be sure that all will work with all the next step (i follow the GATK best practices for germline variants).

When i used bwa mem i should put the path to the folder with all index file of the genome? Because in the folder corresponding to the bwa mem indexes there is no .fa file.

Thanks a lot in advance :-)

genome alignment reference • 1.1k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank's a lot but I don't really understand the article as it seems to do some useless things. Or maybe is to old? For exemple in the genome reference consortium website in the description of the no_alt_analysis reference it's wrote "The two PAR regions on chromosome Y, and duplicate copies of centromeric arrays and WGS on chromosomes 5, 14, 19, 21 & 22, have been hard-masked with Ns" "The full_analysis_set contains the alternate locus scaffolds in addition to all the sequences present in the no_alt_analysis_set.

The full_plus_hs38d1_analysis_set contains the human decoy sequences from hs38d1 (GCA_000786075.2) in addition to all the sequences present in the full_analysis set."

So i think that the best way is to use the full_plus_hs38d1 ? I will try..

ADD REPLY
0
Entering edit mode

Also read https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use Heng Li is probably the definitive source on these issues, so best to go with him.

ADD REPLY
0
Entering edit mode

Thanks. But again it's an article from 2017. And he write to use the no alt version because the tools are not aware. But now the tool like bwa mem are alt aware... It's really dificult to found a recent source.

ADD REPLY
0
Entering edit mode

Even the GATK article is not really up to date as it used the gatk 3... Maybe as i don't know which genome used, is better tout used the no alt to be sure even if i will miss some intersting future about alt contig.

ADD REPLY

Login before adding your answer.

Traffic: 2041 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6