Question: Downloading genomes for drosophila species
0
gravatar for steven
3.9 years ago by
steven70
United States
steven70 wrote:

I am trying to download either full genomes or wgs assembled sequences (depending on what is available) of several drosophila species.

For most species, I was able to find an entry in the NCBI Genome database (ex. http://www.ncbi.nlm.nih.gov/genome/genomes/3489?) that linked to a wgs download page in zipped fasta format (http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=AFFE02 downloads tab). They were all around 50 megabytes zipped.

However, several species were not available in the Genome database and I was only able to find them in the SRA database. When downloaded and converted to fastq format, they ended up being very large files (three were around 10 gigs, one was 26 gigs) and this seemed strange to me in comparison with the 50 mb archives. 

Why are the .sra and fastq files so much larger than the zipped wgs files?

Thanks!

sra wgs genome • 1.4k views
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by steven70
2
gravatar for h.mon
3.9 years ago by
h.mon24k
Brazil
h.mon24k wrote:

The best place to download Drosophila genomes (and annotations, fastas with genes or peptides, etc) is Flybase.

The .fastq and .sra files do not contain assembled genomes, they contain raw sequencing reads or sometimes .bam alignments - hence they are much larger. You have to search if those samples were analyzed, who deposited, if it is published, and so forth. If it has not been published, you should contact the depositor before using the data, to avoid publishing something they are already working on.

ADD COMMENTlink written 3.9 years ago by h.mon24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 888 users visited in the last hour