Question: Drosophila genome indexing using STAR for mapping reads
0
gravatar for ishento
9 months ago by
ishento0
ishento0 wrote:

Could anyone let me know which FASTA files that can be used to build genome index for analysis using STAR. 1- In Ensemble there is 10 unmasked fasta files (chromosomes 2L, 2R, 3L, 3R, 4, X, Y, nochromosomal, mitochondrion genome, and toplevel) ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna/. Which files should be included? Or is the dna_index the one that should be used ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna_index/

2- In flybase, there are also several fasta files, ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r6.21_FB2018_02/fasta/ Which files should be used for building genome index using STAR for mapping reads?

rna-seq • 546 views
ADD COMMENTlink modified 9 months ago by Bastien Hervé3.7k • written 9 months ago by ishento0
1

This has been discussed several times ; have a look at this post

ADD REPLYlink written 9 months ago by bioExplorer3.7k

I open the old post, but I still confused. I have 7 files for chromosoms, nonchromosomal, mitochonderion, toplevel. is it right to use all?

ADD REPLYlink written 9 months ago by ishento0

Okay, I will do, Thanks for your response

ADD REPLYlink written 9 months ago by ishento0

in emsemble; I think the toplevel is the one should be used. am I right?

ADD REPLYlink written 9 months ago by ishento0

Please use ADD COMMENT/ADD REPLY to keep threads logically organized. Do post comments using SUBMIT ANSWER.

ADD REPLYlink written 9 months ago by genomax64k

I still confused. I have 7 files for chromosoms, nonchromosomal, mitochonderion, toplevel. is it right to use all?

ADD REPLYlink written 9 months ago by ishento0
1
gravatar for Bastien Hervé
9 months ago by
Bastien Hervé3.7k
Limoges, CBRS, France
Bastien Hervé3.7k wrote:

If you take a look at this link : ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna/

At the botom you have a README file, which said :

TOPLEVEL

These files contains all sequence regions flagged as toplevel in an Ensembl schema. This includes chromsomes, regions not assembled into chromosomes and N padded haplotype/patch regions.

If you want all informations about the current genome you have to take the Drosophila_melanogaster.BDGP6.dna.toplevel.fa.gz and only this one to create your index

NOTE : But I still don't know why there is a different size file between Drosophila_melanogaster.BDGP6.dna.toplevel.fa.gz in ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna_index/ and ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna/ (see also this post : Fasta file and GTF file for STAR alignment )

ADD COMMENTlink modified 9 months ago • written 9 months ago by Bastien Hervé3.7k

in STAR manual, it stated that "Generally, patches and alternative haplotypes should not be included in the genome". and I think tolevel has haplotypes.

ADD REPLYlink modified 9 months ago • written 9 months ago by ishento0

Depends on your downstream analysis, if you don't care about haplotypes and you want to do a differential expression you can go for the primary assembly. Otherwise if you want to do a variant calling, for example, you will have to take the toplevel to not get false positive. See the @Vijay's comment above ( Filtering out chromosomes from reference genome )

ADD REPLYlink modified 9 months ago • written 9 months ago by Bastien Hervé3.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1174 users visited in the last hour