Entering edit mode
6.0 years ago
ashokkumar.mb
•
0
We are developing a pipeline to identify plant SNPs and Structural variations. As a part of checking, we have to separate a plant genome in different X coverage. For instance, actual genome data coverage is 40 X, now we need to generate it in different X coverage like 10X, 20x and 30x. Is it possible? If yes please help me.
I am not sure if you would like to downsample the reads to 30x, 20x, and 10x coverage or if you want parts of the genome that have 30x, 20x, and 10x coverage. If you are looking to downsample the reads, try bbnorm (https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbnorm-guide/)
Thank you so much
Yes, I would like to do downsample our genome coverage. I read BBTools guide and found a command to do downsample "bbnorm.sh in=reads.fq out=normalized.fq target=100 min=5". Do I need to concern about any other parameters in this? or Do I need to do use any other tool to downsample genome data after this bbnorm?
If this is what you would like to do, then I think for your purposes, you can simply do:
And then re-run our SNP- and Structural-calling pipeline to see the results with
normalized-30x.fq
,normalized-20x.fq
,normalized-10x.fq
. Granted that is assuming that is what you want to do: downsample the reads and see how it affects SNP and Structural calling.Note: you might want to limit memory using the
-Xmx200g
(to limit to max 200GB RAM for example) and addthreads=num_cores_on_your_computer
. By default, I thinkbbnorm.sh
uses 85% of RAM and all cores.Thank you so much for your help
See Split BAM by average coverage .
When calling SNPs and indels with samtools / bcftools, the resulting vcf file will include each variant coverage.