Question: Illumina Ngs Test Data For Snp/Indel Calling And Association Study
1
gravatar for Travis
7.8 years ago by
Travis2.8k
USA
Travis2.8k wrote:

Hi all,

I am currently attempting to create a pipeline for Illumina NGS sequence alignment, SNP/Indel-calling and association testing upon multiple samples.

Can anyone recommend a data set for testing a pipeline like this, right through to the stage of calling associated mutations?

Alternatively, is there an existing means of generating such a data set?

Any tips would be greatly appreciated.

Thanks in advance!

ADD COMMENTlink modified 5.9 years ago by marcodpc30 • written 7.8 years ago by Travis2.8k
3
gravatar for Daniel Swan
7.8 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

Why not use the publicly available 1000genomes data (or subset thereof)?

http://www.1000genomes.org/data

They have sequence data from various platforms, including Illumina. You can work directly on the sequence data, or pick the pre-aligned BAM files to work with.

If you want to speed up throughput for pipeline development, just subset the data into a single chromosome and its associated mapped reads.

EDIT: OK didn't see the part about association testing. I'm not sure what the requirements would be for that in terms of sample numbers/size, but you might still be able to leverage the 1000Genomes datasets that exist.

ADD COMMENTlink written 7.8 years ago by Daniel Swan13k

A good suggestion. At it's very simplest I just want to run through a workflow with a few samples (say 2 phenotypes with 5 members per group and known causal mutations present). I know this isn't scientifically correct - it's simply to get a feel for the workflow and filtering of unassociated mutations etc.

ADD REPLYlink written 7.8 years ago by Travis2.8k
0
gravatar for Travis
7.8 years ago by
Travis2.8k
USA
Travis2.8k wrote:

Looking at papers in the area, I guess one approach would be to download or generate a few sets of reads and then artificially introduce a known causal mutation or two into the groups as appropriate. It is an over-simplification but still useful for test purposes I believe. Exactly how to introduce the mutation into the data sets is another question entirely I suppose!

ADD COMMENTlink written 7.8 years ago by Travis2.8k
0
gravatar for marcodpc
5.9 years ago by
marcodpc30
marcodpc30 wrote:

If I use 1000genome data, can I simulate pooled sample in some way and generate in it SNP&indel ?

ADD COMMENTlink written 5.9 years ago by marcodpc30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2431 users visited in the last hour