Ensembl Hg38 dna, dna_rm, and dna_sm
1
2
Entering edit mode
9.0 years ago
bharata1803 ▴ 560

Hello,

I want to ask a basic question. So, I read from the readme in the Ensembl ftp for Hg38 reference and it seems there are several type of file of dna, which is only dna, dna_sm, and dna_rm. If I want to align some fastq file to Ensembl, to which kind of file I should choose? I notice that I should merge all of the fasta file which is per chromosome based fasta into 1 single fasta file and then create index and then do the alignment process. Should I merge all of the file that I download from Ensembl or I should choose only what I need? If I need to choose, how to choose what I need based on my fastq. Thank you for your help and explanation. By the way, I already merge all of the fasta file into 1 single file (including the one with PATCH name in it) and currently I'm aligning my data.

ensembl alignment sequencing • 5.8k views
ADD COMMENT
4
Entering edit mode
ADD COMMENT
0
Entering edit mode

Thank you for your answer. Is one file is enough because seems there are a lot of files there? How about the per chromosome file? I browse some tutorial, I need to use only the dna.chromosome 1 to 22, plus X, Y, and MT and merge it into 1 fasta file.

ADD REPLY
0
Entering edit mode

The primary assembly contains the chromosomes you listed along with a minor patch I think. So it's a fasta file of everything you want.

ADD REPLY
0
Entering edit mode

Oh, I understand now. Thank you then.

ADD REPLY
0
Entering edit mode

for later releases, like this 100th release, for example, they also have a folder with a so-called "dna-index" in which there is essentially one file that looks like it might be very close to the primary assembly found in the "dna" folder. Which one should be used for the FASTQ alignment? What is this "index" file?

ADD REPLY

Login before adding your answer.

Traffic: 2028 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6