Drosophila genome indexing using STAR for mapping reads
1
0
Entering edit mode
5.9 years ago
ishento • 0

Could anyone let me know which FASTA files that can be used to build genome index for analysis using STAR. 1- In Ensemble there is 10 unmasked fasta files (chromosomes 2L, 2R, 3L, 3R, 4, X, Y, nochromosomal, mitochondrion genome, and toplevel) ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna/. Which files should be included? Or is the dna_index the one that should be used ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna_index/

2- In flybase, there are also several fasta files, ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r6.21_FB2018_02/fasta/ Which files should be used for building genome index using STAR for mapping reads?

RNA-Seq • 3.7k views
ADD COMMENT
1
Entering edit mode

This has been discussed several times ; have a look at this post

ADD REPLY
0
Entering edit mode

I open the old post, but I still confused. I have 7 files for chromosoms, nonchromosomal, mitochonderion, toplevel. is it right to use all?

ADD REPLY
0
Entering edit mode

Okay, I will do, Thanks for your response

ADD REPLY
0
Entering edit mode

in emsemble; I think the toplevel is the one should be used. am I right?

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY to keep threads logically organized. Do post comments using SUBMIT ANSWER.

ADD REPLY
0
Entering edit mode

I still confused. I have 7 files for chromosoms, nonchromosomal, mitochonderion, toplevel. is it right to use all?

ADD REPLY
1
Entering edit mode
5.9 years ago

If you take a look at this link : ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna/

At the botom you have a README file, which said :

TOPLEVEL

These files contains all sequence regions flagged as toplevel in an Ensembl schema. This includes chromsomes, regions not assembled into chromosomes and N padded haplotype/patch regions.

If you want all informations about the current genome you have to take the Drosophila_melanogaster.BDGP6.dna.toplevel.fa.gz and only this one to create your index

NOTE : But I still don't know why there is a different size file between Drosophila_melanogaster.BDGP6.dna.toplevel.fa.gz in ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna_index/ and ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna/ (see also this post : Fasta file and GTF file for STAR alignment )

ADD COMMENT
0
Entering edit mode

in STAR manual, it stated that "Generally, patches and alternative haplotypes should not be included in the genome". and I think tolevel has haplotypes.

ADD REPLY
0
Entering edit mode

Depends on your downstream analysis, if you don't care about haplotypes and you want to do a differential expression you can go for the primary assembly. Otherwise if you want to do a variant calling, for example, you will have to take the toplevel to not get false positive. See the @Vijay's comment above ( Filtering out chromosomes from reference genome )

ADD REPLY

Login before adding your answer.

Traffic: 1976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6