Index with unmasked or masked in HISAT2
1
0
Entering edit mode
14 months ago
sansan_96 ▴ 80

Hello everyone,

I have a couple of doubts about the query source and the genome that I have to use to create an index to align with HISAT2.

The first is whether it is correct to build the index with a "top level" file from Ensembl Plants (https://ftp.ebi.ac.uk/ensemblgenomes/pub/release-55/plants/fasta/zea_mays/dna/) or use the one from NCBI (https://www.ncbi.nlm.nih.gov/genome/?term=zea+mays)

If the correct thing is to use any of the Ensembl Plants, which would be the most ideal?

  1. Zea_mays.Zm-B73-REFERENCE-NAM-5.0.dna.toplevel.fa.gz 615M

  2. Zea_mays.Zm-B73-REFERENCE-NAM-5.0.dna_rm.toplevel.fa.gz 123M

  3. Zea_mays.Zm-B73-REFERENCE-NAM-5.0.dna_sm.toplevel.fa.gz 641M

Description:

'dna' - unmasked genomic DNA.

'dna_rm' - masked genomic DNA.

'dna_sm': masked genomic DNA.

Could you help me clarify my doubts please?

masked HISAT2 toplevel unmasked • 656 views
ADD COMMENT
0
Entering edit mode
14 months ago
GenoMax 141k

Use top level file.

See README more information about other files you are referring to.

ADD COMMENT
0
Entering edit mode

I have read that in masked genomes low complexity and repetitive regions of DNA are detected and replaced with 'N', do you suggest using unmasked?

Thank you for your comments.

ADD REPLY

Login before adding your answer.

Traffic: 2998 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6