Downsample BAM file
1
0
Entering edit mode
7.0 years ago
Simo ▴ 50

Hi all,

I need to randomly pick up one read for each position in a file bam, do you know any way to do that?

I saw there are different tools for setting a specific proportion of reads, but I need a new bam file with only one random read for each position.

Thanks for the suggestions :)

BAM Downsample filtering • 3.2k views
ADD COMMENT
1
Entering edit mode

one read for each position in a file bam

Is that starting position or just one read covering each base?

ADD REPLY
4
Entering edit mode
7.0 years ago

Using awk:

 samtools view -h -F 4 input.bam |\
 awk -F '\t' '/^@/ {print;prevC="";prevP=0;next;} {C=$3;P=int($4);if(!(prevC==C && prevP==P)) {prevC=C;prevP=P;print;} }' |\
 samtools view -S -o out.bam -
ADD COMMENT
0
Entering edit mode

That's great (if not anything else for the speed of writing it!). But if I understand your script it correctly, Pierre, you are picking one read from multiple ones starting at the same position. My understanding of the question is to have, essentially, a BAM file with depth of coverage of exactly 1 or zero. (Am I wrong...?)

ADD REPLY
0
Entering edit mode

Thanks Pierre, it seems to do its job!

Regarding what dariober pointed out, the speed of writing it's not a big deal in my case, since I'm working with a low coverage aDNA, and it just took a couple of minutes to run. My aim is to retrieve one only random read for each position, so basically I should end up with a coverage of 1 for each of them.

ADD REPLY
0
Entering edit mode

(...I was impressed by the speed of writing of Pierre, not of the script!)

If you want a coverage of exactly 1x (or zero if there are no reads in a region) then I think you need something else then Pierre's script. For each position his script gives one read starting there, which is not going to result in each positions with 1x coverage since a read span multiple positions.

ADD REPLY
0
Entering edit mode

So, are saying that using Pierre's script I'm loosing some positions?

Ref  --------X--------------------Y-----

Read1        x--------------------y---

Read2  ------x-----

Basically for the position X in reference I'm going to retrieve Read1, but I'm loosing the read for position Y?

(Sorry I can't format the text in order it appears to be even!!

ADD REPLY
1
Entering edit mode

Fixed the formatting :)

ADD REPLY
0
Entering edit mode

yes, I'm producing a

read for each position

. It's not generating a

depth of 1

ADD REPLY
0
Entering edit mode

Maybe I just confused the concept of "depth"! If your script is producing a random read for each position, it's exactly what I needed! Thanks a lot, really :)

ADD REPLY
0
Entering edit mode

if it answers your question. check the green mark on the left to close it; thanks.

ADD REPLY
0
Entering edit mode

if you want a tiling path. see

and

https://github.com/lindenb/jvarkit/wiki/BamTile

ADD REPLY

Login before adding your answer.

Traffic: 1454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6