Downloading genomes for drosophila species
1
0
Entering edit mode
8.9 years ago
steven ▴ 70

I am trying to download either full genomes or wgs assembled sequences (depending on what is available) of several drosophila species.

For most species, I was able to find an entry in the NCBI Genome database (e.g., http://www.ncbi.nlm.nih.gov/genome/genomes/3489 ?) that linked to a wgs download page in zipped fasta format (http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=AFFE02 downloads tab). They were all around 50 megabytes zipped.

However, several species were not available in the Genome database and I was only able to find them in the SRA database. When downloaded and converted to fastq format, they ended up being very large files (three were around 10 gigs, one was 26 gigs) and this seemed strange to me in comparison with the 50 mb archives.

Why are the .sra and fastq files so much larger than the zipped wgs files?

Thanks!

genome wgs sra • 2.3k views
ADD COMMENT
2
Entering edit mode
8.9 years ago
h.mon 35k

The best place to download Drosophila genomes (and annotations, fastas with genes or peptides, etc) is Flybase.

The .fastq and .sra files do not contain assembled genomes, they contain raw sequencing reads or sometimes .bam alignments - hence they are much larger. You have to search if those samples were analyzed, who deposited, if it is published, and so forth. If it has not been published, you should contact the depositor before using the data, to avoid publishing something they are already working on.

ADD COMMENT

Login before adding your answer.

Traffic: 3221 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6