Need help to separate a genome in different x coverage
1
0
Entering edit mode
6.0 years ago

We are developing a pipeline to identify plant SNPs and Structural variations. As a part of checking, we have to separate a plant genome in different X coverage. For instance, actual genome data coverage is 40 X, now we need to generate it in different X coverage like 10X, 20x and 30x. Is it possible? If yes please help me.

plant genome X coverage Normalization • 1.3k views
ADD COMMENT
2
Entering edit mode

I am not sure if you would like to downsample the reads to 30x, 20x, and 10x coverage or if you want parts of the genome that have 30x, 20x, and 10x coverage. If you are looking to downsample the reads, try bbnorm (https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbnorm-guide/)

ADD REPLY
0
Entering edit mode

Thank you so much

Yes, I would like to do downsample our genome coverage. I read BBTools guide and found a command to do downsample "bbnorm.sh in=reads.fq out=normalized.fq target=100 min=5". Do I need to concern about any other parameters in this? or Do I need to do use any other tool to downsample genome data after this bbnorm?

ADD REPLY
0
Entering edit mode

If this is what you would like to do, then I think for your purposes, you can simply do:

bbnorm.sh in=reads.fq out=normalized-30x.fq target=30 min=5
bbnorm.sh in=reads.fq out=normalized-20x.fq target=20 min=5
bbnorm.sh in=reads.fq out=normalized-10x.fq target=10 min=5

And then re-run our SNP- and Structural-calling pipeline to see the results with normalized-30x.fq, normalized-20x.fq, normalized-10x.fq. Granted that is assuming that is what you want to do: downsample the reads and see how it affects SNP and Structural calling.

Note: you might want to limit memory using the -Xmx200g (to limit to max 200GB RAM for example) and add threads=num_cores_on_your_computer. By default, I think bbnorm.sh uses 85% of RAM and all cores.

ADD REPLY
0
Entering edit mode

Thank you so much for your help

ADD REPLY
0
Entering edit mode

See Split BAM by average coverage .

When calling SNPs and indels with samtools / bcftools, the resulting vcf file will include each variant coverage.

ADD REPLY
0
Entering edit mode
6.0 years ago
Vitis ★ 2.5k

If you're downsampling the WGS experiment across entire genome, simple samtools view -s would do it. Just be careful with using different seeds each time if you're doing repetitive subsampling, otherwise each time would generate the exact same subsample.

ADD COMMENT

Login before adding your answer.

Traffic: 1671 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6