Illumina Ngs Test Data For Snp/Indel Calling And Association Study
3
1
Entering edit mode
13.4 years ago
Travis ★ 2.8k

Hi all,

I am currently attempting to create a pipeline for Illumina NGS sequence alignment, SNP/Indel-calling and association testing upon multiple samples.

Can anyone recommend a data set for testing a pipeline like this, right through to the stage of calling associated mutations?

Alternatively, is there an existing means of generating such a data set?

Any tips would be greatly appreciated.

Thanks in advance!

next-gen sequencing snp association • 3.7k views
ADD COMMENT
3
Entering edit mode
13.4 years ago
User 59 13k

Why not use the publicly available 1000genomes data (or subset thereof)?

http://www.1000genomes.org/data

They have sequence data from various platforms, including Illumina. You can work directly on the sequence data, or pick the pre-aligned BAM files to work with.

If you want to speed up throughput for pipeline development, just subset the data into a single chromosome and its associated mapped reads.

EDIT: OK didn't see the part about association testing. I'm not sure what the requirements would be for that in terms of sample numbers/size, but you might still be able to leverage the 1000Genomes datasets that exist.

ADD COMMENT
0
Entering edit mode

A good suggestion. At it's very simplest I just want to run through a workflow with a few samples (say 2 phenotypes with 5 members per group and known causal mutations present). I know this isn't scientifically correct - it's simply to get a feel for the workflow and filtering of unassociated mutations etc.

ADD REPLY
0
Entering edit mode
13.4 years ago
Travis ★ 2.8k

Looking at papers in the area, I guess one approach would be to download or generate a few sets of reads and then artificially introduce a known causal mutation or two into the groups as appropriate. It is an over-simplification but still useful for test purposes I believe. Exactly how to introduce the mutation into the data sets is another question entirely I suppose!

ADD COMMENT
0
Entering edit mode
11.4 years ago
marcodpc ▴ 60

If I use 1000genome data, can I simulate pooled sample in some way and generate in it SNP&indel ?

ADD COMMENT

Login before adding your answer.

Traffic: 777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6