Question: Need help to separate a genome in different x coverage
0
gravatar for ashokkumar.mb
11 days ago by
ashokkumar.mb0 wrote:

We are developing a pipeline to identify plant SNPs and Structural variations. As a part of checking, we have to separate a plant genome in different X coverage. For instance, actual genome data coverage is 40 X, now we need to generate it in different X coverage like 10X, 20x and 30x. Is it possible? If yes please help me.

ADD COMMENTlink modified 11 days ago by Vitis1.6k • written 11 days ago by ashokkumar.mb0
2

I am not sure if you would like to downsample the reads to 30x, 20x, and 10x coverage or if you want parts of the genome that have 30x, 20x, and 10x coverage. If you are looking to downsample the reads, try bbnorm (https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbnorm-guide/)

ADD REPLYlink written 11 days ago by jean.elbers450

Thank you so much

Yes, I would like to do downsample our genome coverage. I read BBTools guide and found a command to do downsample "bbnorm.sh in=reads.fq out=normalized.fq target=100 min=5". Do I need to concern about any other parameters in this? or Do I need to do use any other tool to downsample genome data after this bbnorm?

ADD REPLYlink written 11 days ago by ashokkumar.mb0

If this is what you would like to do, then I think for your purposes, you can simply do:

bbnorm.sh in=reads.fq out=normalized-30x.fq target=30 min=5
bbnorm.sh in=reads.fq out=normalized-20x.fq target=20 min=5
bbnorm.sh in=reads.fq out=normalized-10x.fq target=10 min=5

And then re-run our SNP- and Structural-calling pipeline to see the results with normalized-30x.fq, normalized-20x.fq, normalized-10x.fq. Granted that is assuming that is what you want to do: downsample the reads and see how it affects SNP and Structural calling.

Note: you might want to limit memory using the -Xmx200g (to limit to max 200GB RAM for example) and add threads=num_cores_on_your_computer. By default, I think bbnorm.sh uses 85% of RAM and all cores.

ADD REPLYlink modified 10 days ago • written 10 days ago by jean.elbers450

Thank you so much for your help

ADD REPLYlink written 8 days ago by ashokkumar.mb0

See Split BAM by average coverage .

When calling SNPs and indels with samtools / bcftools, the resulting vcf file will include each variant coverage.

ADD REPLYlink written 11 days ago by h.mon21k
0
gravatar for Vitis
11 days ago by
Vitis1.6k
New York
Vitis1.6k wrote:

If you're downsampling the WGS experiment across entire genome, simple samtools view -s would do it. Just be careful with using different seeds each time if you're doing repetitive subsampling, otherwise each time would generate the exact same subsample.

ADD COMMENTlink modified 11 days ago • written 11 days ago by Vitis1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1606 users visited in the last hour