Question: Drosophila genome indexing using STAR for mapping reads
0
gravatar for ishento
18 months ago by
ishento0
ishento0 wrote:

Could anyone let me know which FASTA files that can be used to build genome index for analysis using STAR. 1- In Ensemble there is 10 unmasked fasta files (chromosomes 2L, 2R, 3L, 3R, 4, X, Y, nochromosomal, mitochondrion genome, and toplevel) ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna/. Which files should be included? Or is the dna_index the one that should be used ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna_index/

2- In flybase, there are also several fasta files, ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r6.21_FB2018_02/fasta/ Which files should be used for building genome index using STAR for mapping reads?

rna-seq • 895 views
ADD COMMENTlink modified 18 months ago by Bastien Hervé4.5k • written 18 months ago by ishento0
1

This has been discussed several times ; have a look at this post

ADD REPLYlink written 18 months ago by lakhujanivijay4.7k

I open the old post, but I still confused. I have 7 files for chromosoms, nonchromosomal, mitochonderion, toplevel. is it right to use all?

ADD REPLYlink written 18 months ago by ishento0

Okay, I will do, Thanks for your response

ADD REPLYlink written 18 months ago by ishento0

in emsemble; I think the toplevel is the one should be used. am I right?

ADD REPLYlink written 18 months ago by ishento0

Please use ADD COMMENT/ADD REPLY to keep threads logically organized. Do post comments using SUBMIT ANSWER.

ADD REPLYlink written 18 months ago by genomax75k

I still confused. I have 7 files for chromosoms, nonchromosomal, mitochonderion, toplevel. is it right to use all?

ADD REPLYlink written 18 months ago by ishento0
1
gravatar for Bastien Hervé
18 months ago by
Bastien Hervé4.5k
Limoges, CBRS, France
Bastien Hervé4.5k wrote:

If you take a look at this link : ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna/

At the botom you have a README file, which said :

TOPLEVEL

These files contains all sequence regions flagged as toplevel in an Ensembl schema. This includes chromsomes, regions not assembled into chromosomes and N padded haplotype/patch regions.

If you want all informations about the current genome you have to take the Drosophila_melanogaster.BDGP6.dna.toplevel.fa.gz and only this one to create your index

NOTE : But I still don't know why there is a different size file between Drosophila_melanogaster.BDGP6.dna.toplevel.fa.gz in ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna_index/ and ftp://ftp.ensembl.org/pub/release-92/fasta/drosophila_melanogaster/dna/ (see also this post : Fasta file and GTF file for STAR alignment )

ADD COMMENTlink modified 18 months ago • written 18 months ago by Bastien Hervé4.5k

in STAR manual, it stated that "Generally, patches and alternative haplotypes should not be included in the genome". and I think tolevel has haplotypes.

ADD REPLYlink modified 18 months ago • written 18 months ago by ishento0

Depends on your downstream analysis, if you don't care about haplotypes and you want to do a differential expression you can go for the primary assembly. Otherwise if you want to do a variant calling, for example, you will have to take the toplevel to not get false positive. See the @Vijay's comment above ( Filtering out chromosomes from reference genome )

ADD REPLYlink modified 18 months ago • written 18 months ago by Bastien Hervé4.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 868 users visited in the last hour