Question: Random downsampling without replacement in bam files only in regions with coverage>x
0
gravatar for VictorMG
20 months ago by
VictorMG0
VictorMG0 wrote:

So I have a bam file with a huge coverage in some regions and I want to subsample randomly only reads that cover that zones, without sampling reads that cover zones of low coverages.

Is there any tool or something? Thanks!

coverage bam reads • 813 views
ADD COMMENTlink written 20 months ago by VictorMG0
2
gravatar for genomax
20 months ago by
genomax62k
United States
genomax62k wrote:

What is the ultimate goal of this cherry-picking exercise? samtools view -s region would be easiest.

ADD COMMENTlink modified 20 months ago • written 20 months ago by genomax62k

So the objetive of my "cherry-picking (lol?)" I that I'm using a genome assembly pipeline that does 20 random subsamplings to down the coverage to 150x in every subsampling and then do the consensus to reconstruct the genome. The problem is that I have some samples where coverage is so huge in some regions but relatively low in others, so when I make the random subsampling it decreases coverage along all the regions included those with less than 150x, so that it would be suitable to only do the subsamplings of the reads covering high coverage regions but not in the other ones, because during consensus it creates gaps in the reconstructed genome.

ADD REPLYlink written 20 months ago by VictorMG0

I did not know that samtools could do this, now that I looked it up indeed there is such a feature. Thought the correct command would be

samtools view -s <fraction> region

Picard also has a similar function DownSample that you may use

https://broadinstitute.github.io/picard/command-line-overview.html#DownsampleSam

ADD REPLYlink written 20 months ago by Istvan Albert ♦♦ 79k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 992 users visited in the last hour