Question: Random downsampling without replacement in bam files only in regions with coverage>x
gravatar for VicGB
2.5 years ago by
VicGB0 wrote:

So I have a bam file with a huge coverage in some regions and I want to subsample randomly only reads that cover that zones, without sampling reads that cover zones of low coverages.

Is there any tool or something? Thanks!

coverage bam reads • 1.1k views
ADD COMMENTlink written 2.5 years ago by VicGB0
gravatar for genomax
2.5 years ago by
United States
genomax75k wrote:

What is the ultimate goal of this cherry-picking exercise? samtools view -s region would be easiest.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by genomax75k

So the objetive of my "cherry-picking (lol?)" I that I'm using a genome assembly pipeline that does 20 random subsamplings to down the coverage to 150x in every subsampling and then do the consensus to reconstruct the genome. The problem is that I have some samples where coverage is so huge in some regions but relatively low in others, so when I make the random subsampling it decreases coverage along all the regions included those with less than 150x, so that it would be suitable to only do the subsamplings of the reads covering high coverage regions but not in the other ones, because during consensus it creates gaps in the reconstructed genome.

ADD REPLYlink written 2.5 years ago by VicGB0

I did not know that samtools could do this, now that I looked it up indeed there is such a feature. Thought the correct command would be

samtools view -s <fraction> region

Picard also has a similar function DownSample that you may use

ADD REPLYlink written 2.5 years ago by Istvan Albert ♦♦ 82k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1849 users visited in the last hour