Question: Where can I find NGS data to test alignment and SNP detection tool
0
gravatar for somon_955
3.1 years ago by
somon_9550
somon_9550 wrote:

Hi, I want to do carry out SNP detection on genomic data which has to be aligned with an in-house tool. I have to check how this tool performs with respect to other alignment tools, and whether it improves snp detection.

Any ideas wherefrom to download the genomic data (fastq), which I can then align and check for SNPs? Is it possible to find NGS data used in publications, so that I can compare if there is any difference in the detection SNPs with our tools vs existing tools?

I am new to genomics - any suggestion will be helpful.

Thanks.

sequencing snp alignment next-gen • 1.3k views
ADD COMMENTlink modified 2.1 years ago by Biostar ♦♦ 20 • written 3.1 years ago by somon_9550

Did you check the 1000 genomes already? They have a lot of data available on their ftp.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Benn7.9k

Damn, by looking for the links you beat me by a few minutes ;-)

ADD REPLYlink written 3.1 years ago by WouterDeCoster42k

Haha, I didn't know we were in competition ;P

ADD REPLYlink written 3.1 years ago by Benn7.9k
4
gravatar for WouterDeCoster
3.1 years ago by
Belgium
WouterDeCoster42k wrote:

Data is available from many 1000 genome samples, including "famous" samples such as the Yoruba trio (na19240 etc). Data can be accessed http://www.internationalgenome.org/data, for example http://www.internationalgenome.org/data-portal/sample/NA19240

ADD COMMENTlink written 3.1 years ago by WouterDeCoster42k
1
gravatar for Zaag
3.1 years ago by
Zaag720
Amsterdam
Zaag720 wrote:

You can download a dataset and upload your results to compare them with other pipelines. I believe this is not real sequencing data, but a reference that's cut up to get 'reads' of a certain length.

http://www.bioplanet.com/gcat

And the Genome in a bottle consortium has all kinds of 'golden standard' datasets you can use (actual sequencing data):

https://www.nist.gov/news-events/news/2016/09/nist-releases-new-family-standardized-genomes

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Zaag720
1
gravatar for chen
3.1 years ago by
chen1.9k
OpenGene
chen1.9k wrote:

best place to find real data: NCBI SRA https://www.ncbi.nlm.nih.gov/sra/

ADD COMMENTlink written 3.1 years ago by chen1.9k

The biggest problem with "real data" is that there is no "truth set": variants which you can be sure of. The data from the genome in a bottle consortium is a set of golden standards data with extensive validation and confirmation using different technologies.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by WouterDeCoster42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1097 users visited in the last hour