Question

Need help to separate a genome in different x coverage

0

Entering edit mode

6.7 years ago

ashokkumar.mb • 0

We are developing a pipeline to identify plant SNPs and Structural variations. As a part of checking, we have to separate a plant genome in different X coverage. For instance, actual genome data coverage is 40 X, now we need to generate it in different X coverage like 10X, 20x and 30x. Is it possible? If yes please help me.

plant genome X coverage Normalization • 1.5k views

ADD COMMENT • link updated 6.7 years ago by Vitis ★ 2.6k • written 6.7 years ago by ashokkumar.mb • 0

2

Entering edit mode

I am not sure if you would like to downsample the reads to 30x, 20x, and 10x coverage or if you want parts of the genome that have 30x, 20x, and 10x coverage. If you are looking to downsample the reads, try bbnorm (https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbnorm-guide/)

ADD REPLY • link 6.7 years ago by jean.elbers ★ 1.7k

0

Entering edit mode

Thank you so much

Yes, I would like to do downsample our genome coverage. I read BBTools guide and found a command to do downsample "bbnorm.sh in=reads.fq out=normalized.fq target=100 min=5". Do I need to concern about any other parameters in this? or Do I need to do use any other tool to downsample genome data after this bbnorm?

ADD REPLY • link 6.7 years ago by ashokkumar.mb • 0

0

Entering edit mode

If this is what you would like to do, then I think for your purposes, you can simply do:

bbnorm.sh in=reads.fq out=normalized-30x.fq target=30 min=5
bbnorm.sh in=reads.fq out=normalized-20x.fq target=20 min=5
bbnorm.sh in=reads.fq out=normalized-10x.fq target=10 min=5

And then re-run our SNP- and Structural-calling pipeline to see the results with normalized-30x.fq, normalized-20x.fq, normalized-10x.fq. Granted that is assuming that is what you want to do: downsample the reads and see how it affects SNP and Structural calling.

Note: you might want to limit memory using the -Xmx200g (to limit to max 200GB RAM for example) and add threads=num_cores_on_your_computer. By default, I think bbnorm.sh uses 85% of RAM and all cores.

ADD REPLY • link 6.7 years ago by jean.elbers ★ 1.7k

0

Entering edit mode

Thank you so much for your help

ADD REPLY • link 6.7 years ago by ashokkumar.mb • 0

0

Entering edit mode

See Split BAM by average coverage .

When calling SNPs and indels with samtools / bcftools, the resulting vcf file will include each variant coverage.

ADD REPLY • link 6.7 years ago by h.mon 35k

score 0 · Answer 1 · 2018-11-09

0

Entering edit mode

6.7 years ago

Vitis ★ 2.6k

If you're downsampling the WGS experiment across entire genome, simple samtools view -s would do it. Just be careful with using different seeds each time if you're doing repetitive subsampling, otherwise each time would generate the exact same subsample.

ADD COMMENT • link 6.7 years ago by Vitis ★ 2.6k