Downsample BAM file
1
0
Entering edit mode
5.5 years ago
Simo ▴ 50

Hi all,

I need to randomly pick up one read for each position in a file bam, do you know any way to do that?

I saw there are different tools for setting a specific proportion of reads, but I need a new bam file with only one random read for each position.

Thanks for the suggestions :)

BAM Downsample filtering • 2.6k views
1
Entering edit mode

one read for each position in a file bam

Is that starting position or just one read covering each base?

4
Entering edit mode
5.5 years ago

Using awk:

samtools view -h -F 4 input.bam |\
awk -F '\t' '/^@/ {print;prevC="";prevP=0;next;} {C=$3;P=int($4);if(!(prevC==C && prevP==P)) {prevC=C;prevP=P;print;} }' |\
samtools view -S -o out.bam -
0
Entering edit mode

That's great (if not anything else for the speed of writing it!). But if I understand your script it correctly, Pierre, you are picking one read from multiple ones starting at the same position. My understanding of the question is to have, essentially, a BAM file with depth of coverage of exactly 1 or zero. (Am I wrong...?)

0
Entering edit mode

Thanks Pierre, it seems to do its job!

Regarding what dariober pointed out, the speed of writing it's not a big deal in my case, since I'm working with a low coverage aDNA, and it just took a couple of minutes to run. My aim is to retrieve one only random read for each position, so basically I should end up with a coverage of 1 for each of them.

0
Entering edit mode

(...I was impressed by the speed of writing of Pierre, not of the script!)

If you want a coverage of exactly 1x (or zero if there are no reads in a region) then I think you need something else then Pierre's script. For each position his script gives one read starting there, which is not going to result in each positions with 1x coverage since a read span multiple positions.

0
Entering edit mode

So, are saying that using Pierre's script I'm loosing some positions?

Ref  --------X--------------------Y-----

Basically for the position X in reference I'm going to retrieve Read1, but I'm loosing the read for position Y?

(Sorry I can't format the text in order it appears to be even!!

1
Entering edit mode

Fixed the formatting :)

0
Entering edit mode

yes, I'm producing a

. It's not generating a

depth of 1

0
Entering edit mode

Maybe I just confused the concept of "depth"! If your script is producing a random read for each position, it's exactly what I needed! Thanks a lot, really :)

0
Entering edit mode

if it answers your question. check the green mark on the left to close it; thanks.

0
Entering edit mode

if you want a tiling path. see

and

https://github.com/lindenb/jvarkit/wiki/BamTile