Question: Species representtation of the NCBI RefSeq for simulated reads
0
gravatar for dabid
4 months ago by
dabid0
dabid0 wrote:

I want to generate simulated reads from the NCBI RefSeq using ART. As the NCBI RefSeq database is so big and have similar genomes although they are not redundant, I want to get a representative of every possible species in the RefSeq database (Viral, Bacteria, Archaea, etc). So, I will use this species representatives to generate the simulated reads instead of using the whole NCBI database.

Any hints on how to find/get the species representative of RefSeq NCBI?

Thanks.

dna simulated data genome ncbi • 204 views
ADD COMMENTlink modified 3 months ago by Biostar ♦♦ 20 • written 4 months ago by dabid0

And how would you select that one sequence (and have it represent) a species)? What exactly are you trying to do by making this dataset?

ADD REPLYlink written 4 months ago by genomax34k

I want to make a comprehensive simulated reads to benchmark few metagenomic tools. But as the NCBI refseq is very huge (especially for bacteria more than 50000 genomes), I cannot use the whole refseq. This is why I thought about getting only one genome from every species in the refseq. In this way, I reduce the number of genomes that I will use to simulate reads.

ADD REPLYlink written 4 months ago by dabid0

Ah, you are planning to use a genome to generate representative reads (not one read per species as I mistakenly thought).

There are assembly summary files on NCBI's genome FTP site (e.g. this one is for RefSeq bacteria). You can get that file and pull out one representative genome (and its accession number). From there you can use the idea here to get the sequence.

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax34k

yeah, I got the idea.. (Actually I found another link that did almost what I want to do) Thank you so much!

ADD REPLYlink written 4 months ago by dabid0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1437 users visited in the last hour