Finding the raw data used to build reference genomes
1
0
Entering edit mode
20 months ago
Aaron • 0

Is the a way to find the raw data (fastq or other) that was used to generate a reference genome? and is there a quick way to do this for a large number of genomes?

reference-genome • 634 views
ADD COMMENT
3
Entering edit mode
20 months ago
GenoMax 151k

You could do this using EntrezDirect. I am using random NCBI genbank assembly identifiers below. Once you have the SRA accession you can get at the sequence data.

$ esearch -db assembly -query GCA_008245085 | elink -target biosample | efetch -format docsum | xtract -pattern DocumentSummary -element Identifiers
BioSample: SAMN06711904; Sample name: PFDSM3638; SRA: SRS4513276

One more example (from RefSeq)

$ esearch -db assembly -query GCF_021347895 | elink -target biosample | efetch -format docsum | xtract -pattern DocumentSummary -element Identifiers
BioSample: SAMN16534234; Sample name: KAUSTApolyChrSc; SRA: SRS7576196
ADD COMMENT

Login before adding your answer.

Traffic: 3181 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6