Sub sampling bam files (dilution)
0
0
Entering edit mode
6.5 years ago
Chadi Saad ▴ 100

Hello,

I want to realize an in-silico dilution of a bam file into another one. I have 2 bam file (bam1 , bam2), I want to create a bam file (bam3) with 10% of bam1 and 90% bam2 at each sequenced position. How can I do it ?

the samtools view -s commande will take a fraction of a bam file, but since we don't guaranty the same depth coverage at a same position, we cannot guaranty the final fraction at a certain position.

I realize that it will be difficult to have the exact wanted fraction, since a read cover multiples positions, but I can tolerate some variance (10% +- 2% for exp).

Maybe this can be do it by taking an absolute number of reads covering each position instead of taking a fraction like samtools view -s

thanks a lot :)

bam alignment ngs • 2.2k views
ADD COMMENT
0
Entering edit mode

but since we don't guaranty the same depth coverage at a same position, we cannot guaranty the final fraction at a certain position

it' not clear to me.

ADD REPLY
0
Entering edit mode

If the I have 2 bam files with 30x, it doesn't mean that the 2 bam files have exactly the same depth of coverage of each position. so I can't simply take 10% of reads from bam1 and 90% from bam 2. Its depends from the depth coverage at each position. In the final bam, i want to have 10% bam1 and 90 bam2 at each position

ADD REPLY
0
Entering edit mode

That is a really odd request. Can I ask what you are going to do with this?

ADD REPLY
0
Entering edit mode

I'm trying to simulate low allelic ratio variants, by pooling many bam files coming from multiple samples

ADD REPLY
0
Entering edit mode

can't you just simulate the bam instead of using two existing bams ?

ADD REPLY
0
Entering edit mode

No, i have to do it with this 2 bam files, because they contains some SNP of interest (actually Im trying to dilute SNPs to evaluate a variant caller performance with low allelic ratio variants and sequencing errors). Maybe an idea to do it, is to use your tool : https://github.com/lindenb/jvarkit/wiki/Biostar154220 , in order to cap coverage to a fixed 'N' , and then removing all regions with coverage is < N. so i have a bam file with a fixed coverage, that is exactly = N ( at least you have more simple way to do it).

Then after having the 2 bams with fixed coverage N, i can use samtools view -s , to take 0.1N from bam1 and 0.9N from bam2..

ADD REPLY

Login before adding your answer.

Traffic: 1963 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6