Where can I find NGS data to test alignment and SNP detection tool
3
0
Entering edit mode
7.5 years ago
somon_955 • 0

Hi, I want to do carry out SNP detection on genomic data which has to be aligned with an in-house tool. I have to check how this tool performs with respect to other alignment tools, and whether it improves snp detection.

Any ideas wherefrom to download the genomic data (fastq), which I can then align and check for SNPs? Is it possible to find NGS data used in publications, so that I can compare if there is any difference in the detection SNPs with our tools vs existing tools?

I am new to genomics - any suggestion will be helpful.

Thanks.

SNP next-gen sequencing alignment • 2.3k views
ADD COMMENT
0
Entering edit mode

Did you check the 1000 genomes already? They have a lot of data available on their ftp.

ADD REPLY
0
Entering edit mode

Damn, by looking for the links you beat me by a few minutes ;-)

ADD REPLY
0
Entering edit mode

Haha, I didn't know we were in competition ;P

ADD REPLY
4
Entering edit mode
7.5 years ago

Data is available from many 1000 genome samples, including "famous" samples such as the Yoruba trio (na19240 etc). Data can be accessed http://www.internationalgenome.org/data, for example http://www.internationalgenome.org/data-portal/sample/NA19240

ADD COMMENT
1
Entering edit mode
7.5 years ago
Zaag ▴ 860

You can download a dataset and upload your results to compare them with other pipelines. I believe this is not real sequencing data, but a reference that's cut up to get 'reads' of a certain length.

http://www.bioplanet.com/gcat

And the Genome in a bottle consortium has all kinds of 'golden standard' datasets you can use (actual sequencing data):

https://www.nist.gov/news-events/news/2016/09/nist-releases-new-family-standardized-genomes

ADD COMMENT
1
Entering edit mode
7.5 years ago
chen ★ 2.5k

best place to find real data: NCBI SRA https://www.ncbi.nlm.nih.gov/sra/

ADD COMMENT
0
Entering edit mode

The biggest problem with "real data" is that there is no "truth set": variants which you can be sure of. The data from the genome in a bottle consortium is a set of golden standards data with extensive validation and confirmation using different technologies.

ADD REPLY

Login before adding your answer.

Traffic: 2571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6