Question

obtaining fastq files from NCBI

0

Entering edit mode

4.7 years ago

truebeliever24 ▴ 50

Hi everyone,

I am trying to download a genome in the fastq format, but can only access the fasta format to this point. I know that I can use the SRAtoolkit to convert SRA format to fastq, but I'm not sure which genome to choose, or even if these are entire genomes.

For example, when I search "Calypte anna", for SRA-->DNA-->genomes, I get the options below...are these all good options? My end-goal is to incorporate this Calypte anna genome into my dataset (genomes of other species) in a single VCF file.

Search results Items: 14 Filters activated: DNA, genome. Clear all to show 16 items. Select item 1649833 1. Anna's Hummingbird 17kb cut on Blue Pippin and 110 pM loading concentration

10 PACBIO_SMRT (PacBio RS II) runs: 1.6M spots, 26.3G bases, 87.5Gb downloads

Accession: SRX1131887 Select item 1648286

Anna's Hummingbird 17kb cut on Blue Pippin and 125 pM loading concentration

52 PACBIO_SMRT (PacBio RS II) runs: 8.5M spots, 122.6G bases, 412.3Gb downloads

Accession: SRX1130526 Select item 1648285

Anna's Hummingbird 17kb cut on Blue Pippin and 100 pM loading concentration

1 PACBIO_SMRT (PacBio RS II) run: 163,482 spots, 1.4G bases, 4.8Gb downloads

Accession: SRX1130525 Select item 456853

`BGI-FCB06AHABXX-110603-L3-N300

1 ILLUMINA (Illumina HiSeq 2000) run: 103.9M spots, 10.2G bases, 5.7Gb downloads

Accession: SRX327908 Select item 456852 5. BGI-FCB066MABXX-110618-L2-N300

1 ILLUMINA (Illumina HiSeq 2000) run: 102.8M spots, 10.1G bases, 5.2Gb downloads

Accession: SRX327907 Select item 456851 6. BGI-FCB05B5ABXX-110525-L6-N300

1 ILLUMINA (Illumina HiSeq 2000) run: 101.6M spots, 10G bases, 5.2Gb downloads

Accession: SRX327906

genome ncbi sratoolkit • 2.1k views

ADD COMMENT • link 4.7 years ago by truebeliever24 ▴ 50

0

Entering edit mode

If you search on sra-explorer using the search term "Calypte anna"[Organism] OR Calypte anna[All Fields] you will get 76 results. Looks like you have data from genome, large fragments purified by pippin prep and sequenced on RSII, transcriptome etc. All these are obviously raw sequence datasets but they represent a good variety.

If you want to get assembled genomes then they are available here where someone has already done the assembly of the genome.

My end-goal is to incorporate this Calypte anna genome into my dataset (genomes of other species) in a single VCF file.

I am not sure what you mean by that. Do you want to align your data against the Anna genome (ref assembly above) or align raw Anna data against your own genome to create VCF's?

ADD REPLY • link 4.7 years ago by GenoMax 141k

0

Entering edit mode

You can download fastq files directly from the ENA: Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD REPLY • link 4.7 years ago by ATpoint 82k

0

Entering edit mode

Thanks for the reply. I have a set of genomes for two species aligned to the Anna's genome. I want to make a phylogeny, using Anna's as the outgroup, so I am trying to obtain raw reads of Anna's Hummingbird to align to the Anna's genome.

Is raw read data available somewhere? I've been trying for a long while now and I can't seem to get anything to work. This is the closest I've found, but as you said, it's a bit of a mess: https://www.ncbi.nlm.nih.gov/sra/SRX1131887[accn]

Here's the assembly I found, thanks to your suggestion: https://www.ncbi.nlm.nih.gov/assembly/GCA_003957555.2 ...should I use scaffold or chromosome?

Thanks again for your help.

ADD REPLY • link 4.7 years ago by truebeliever24 ▴ 50