Question: Downsample BAM file
0
gravatar for Simo
22 months ago by
Simo30
Italy
Simo30 wrote:

Hi all,

I need to randomly pick up one read for each position in a file bam, do you know any way to do that?

I saw there are different tools for setting a specific proportion of reads, but I need a new bam file with only one random read for each position.

Thanks for the suggestions :)

filtering downsample bam • 839 views
ADD COMMENTlink modified 22 months ago by Pierre Lindenbaum116k • written 22 months ago by Simo30
1

one read for each position in a file bam

Is that starting position or just one read covering each base?

ADD REPLYlink written 22 months ago by genomax62k
4
gravatar for Pierre Lindenbaum
22 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

Using awk:

 samtools view -h -F 4 input.bam |\
 awk -F '\t' '/^@/ {print;prevC="";prevP=0;next;} {C=$3;P=int($4);if(!(prevC==C && prevP==P)) {prevC=C;prevP=P;print;} }' |\
 samtools view -S -o out.bam -
ADD COMMENTlink modified 22 months ago • written 22 months ago by Pierre Lindenbaum116k

That's great (if not anything else for the speed of writing it!). But if I understand your script it correctly, Pierre, you are picking one read from multiple ones starting at the same position. My understanding of the question is to have, essentially, a BAM file with depth of coverage of exactly 1 or zero. (Am I wrong...?)

ADD REPLYlink written 22 months ago by dariober9.9k

Thanks Pierre, it seems to do its job!

Regarding what dariober pointed out, the speed of writing it's not a big deal in my case, since I'm working with a low coverage aDNA, and it just took a couple of minutes to run. My aim is to retrieve one only random read for each position, so basically I should end up with a coverage of 1 for each of them.

ADD REPLYlink written 22 months ago by Simo30

(...I was impressed by the speed of writing of Pierre, not of the script!)

If you want a coverage of exactly 1x (or zero if there are no reads in a region) then I think you need something else then Pierre's script. For each position his script gives one read starting there, which is not going to result in each positions with 1x coverage since a read span multiple positions.

ADD REPLYlink modified 22 months ago • written 22 months ago by dariober9.9k

So, are saying that using Pierre's script I'm loosing some positions?

Ref  --------X--------------------Y-----

Read1        x--------------------y---

Read2  ------x-----

Basically for the position X in reference I'm going to retrieve Read1, but I'm loosing the read for position Y?

(Sorry I can't format the text in order it appears to be even!!

ADD REPLYlink modified 22 months ago by genomax62k • written 22 months ago by Simo30
1

Fixed the formatting :)

ADD REPLYlink written 22 months ago by genomax62k

yes, I'm producing a

read for each position

. It's not generating a

depth of 1

ADD REPLYlink written 22 months ago by Pierre Lindenbaum116k

Maybe I just confused the concept of "depth"! If your script is producing a random read for each position, it's exactly what I needed! Thanks a lot, really :)

ADD REPLYlink written 22 months ago by Simo30

if it answers your question. check the green mark on the left to close it; thanks.

ADD REPLYlink written 22 months ago by Pierre Lindenbaum116k

if you want a tiling path. see

and

https://github.com/lindenb/jvarkit/wiki/BamTile

ADD REPLYlink written 22 months ago by Pierre Lindenbaum116k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1153 users visited in the last hour