Question: Test set for WGS analysis pipeline
0
gravatar for gwotto
12 months ago by
gwotto0
gwotto0 wrote:

Hi, I am developing a pipeline for whole-genome sequencing analysis, including software to align sequences (e.g. bwa), quality control and variant calling, e.g. by GATK. In order to write tests and debug the pipeline, I need a small test set. Are there any best practices and instructions around about how such a test set should be generated? Or are there any publicly available test sets? Thanks a lot for your help!

ADD COMMENTlink modified 12 months ago by MSM5580 • written 12 months ago by gwotto0
0
gravatar for MSM55
12 months ago by
MSM5580
Israel
MSM5580 wrote:

There are lot of publicly available data-set on NCBI-SRA

ADD COMMENTlink written 12 months ago by MSM5580

Perhaps I should have been more specific. I am not looking for whole genome data in general, rather I would have something like a small section of the genome as fastq reads, a small section of the genome as reference with the appropriate indices, either simulated or from real data. The goal is to have a test set that runs in a couple of minutes rather than hours or days. I would like to know how people generate such a data set, or if there are some around that are used. So far I haven't come across any...

ADD REPLYlink written 12 months ago by gwotto0
1

If your aim is to simulate data then you can check out answer by Vijay Lakhujani in this post

ADD REPLYlink written 12 months ago by toralmanvar720
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1159 users visited in the last hour