Question: Test set for WGS analysis pipeline
0
gravatar for gwotto
9 months ago by
gwotto0
gwotto0 wrote:

Hi, I am developing a pipeline for whole-genome sequencing analysis, including software to align sequences (e.g. bwa), quality control and variant calling, e.g. by GATK. In order to write tests and debug the pipeline, I need a small test set. Are there any best practices and instructions around about how such a test set should be generated? Or are there any publicly available test sets? Thanks a lot for your help!

ADD COMMENTlink modified 9 months ago by MSM5570 • written 9 months ago by gwotto0
0
gravatar for MSM55
9 months ago by
MSM5570
Israel
MSM5570 wrote:

There are lot of publicly available data-set on NCBI-SRA

ADD COMMENTlink written 9 months ago by MSM5570

Perhaps I should have been more specific. I am not looking for whole genome data in general, rather I would have something like a small section of the genome as fastq reads, a small section of the genome as reference with the appropriate indices, either simulated or from real data. The goal is to have a test set that runs in a couple of minutes rather than hours or days. I would like to know how people generate such a data set, or if there are some around that are used. So far I haven't come across any...

ADD REPLYlink written 9 months ago by gwotto0

If your aim is to simulate data then you can check out answer by Vijay Lakhujani in this post

ADD REPLYlink written 9 months ago by toralmanvar530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1029 users visited in the last hour