Question: Subsampling one bamfile multiple times?
0
gravatar for wjdgmlcjf
4 months ago by
wjdgmlcjf0
wjdgmlcjf0 wrote:

Hi,

I have a bam file and I want to subsample this bam file 10 times with 1x, 2x, 5x coverage each.

Can anyone help me solving this?

I tried to use samtools and sambamba but nothing could make accurate results.

Thank you!

sequencing subsample bam • 325 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by wjdgmlcjf0
2

I tried to use samtools and sambamba but nothing could make accurate results.

What do you mean by that ? I used samtools for subsampling and it worked pretty well with the -s option. Of course you need to determine beforehand what fraction of your bam files corresponds to 1x, 2x, 5x coverage.

ADD REPLYlink written 4 months ago by Carlo Yague4.6k

My major problem is how to subsample one file multiple times, I saw many questions about subsampling multiple files, but I need to subsample one file many times.

For example, I want to generate 1x_subsampled_1.bam, 1x_subsampled_2.bam... 2x_subsampled_1.bam, 2x_subsampled_2.bam and so on.

ADD REPLYlink modified 4 months ago • written 4 months ago by wjdgmlcjf0
1

Hello wjdgmlcjf ,

it is still unclear what's your problem. Just run the subsampling command multiple times with different parameters and your are done, aren't you? Or which point I'm missing?

fin swimmer

ADD REPLYlink written 4 months ago by finswimmer12k

I am not sure if this is what you are looking for, but the -s argument does support multiple seed values:

-s FLOAT subsample reads (given INT.FRAC option value, 0.FRAC is the fraction of templates/read pairs to keep; INT part sets seed)

Changing the seed will result in a different (but reproducible using the same seed) set of random values, resulting in different bam files for the same chosen fraction.

Remark that samtools view -s may be absent in older samtools versions.

I hope this helps a bit,

Youri

ADD REPLYlink written 4 months ago by yhoogstrate50

Yeah that's right, but I want to write in one script, not running command multiple times. Because I want to generate about 50~60 subsampled bam files.

ADD REPLYlink written 4 months ago by wjdgmlcjf0
1

Picard downsample ?

Get 25% of bam file :

picard DownsampleSam INPUT=input.bam OUTPUT=input.25pcReads.bam RANDOM_SEED=50 PROBABILITY=0.25 ALIDATION_STRINGENCY=SILENT
ADD REPLYlink modified 4 months ago • written 4 months ago by Bastien HervĂ©4.4k
1
gravatar for Carlo Yague
4 months ago by
Carlo Yague4.6k
Belgium
Carlo Yague4.6k wrote:

Then you should combine samtools with a for loop in a bash script. For instance to subsample multiple files multiple times at multiple depth, you could use this code (untested).

#!/bin/bash

# down-sample multiple bam files (given as input) 100 times for K fractions (1, 5, 10, 25 % in this example) of the original bam files
# This script requires samtools installed and in your PATH.


set -e
if [ $# -eq 0 ]; then
  echo "Usage : down_sample_bam.sh bam_file_1 [bam_file_2, ...]"
else
  for i in $*
    do
    echo ""; echo "bam file : ${i}"; echo "" ; echo "downsampling reads..."
    for k in 1 5 10 25
    do
      echo "${k} %"
      for j in {0..99}
        do
        samtools view -b -s ${j}.${k} ${i} > ${i}_down_${k}_${j}.bam
        done
      done
    done
    echo "done !"
fi
exit 1

PS: note that samtools view -s breaks read pairs. If you have paired-end reads and want to keep your mate pairs together, you might use bedtools bamtobed -bedpe, extract a few random lines (with shuf or rl) then convert back to bam with bedtools bedpetobam.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Carlo Yague4.6k

Thank you! This is just what I wanted.

I will use only one bam file, thus I will modify a little bit and test.

ADD REPLYlink modified 4 months ago • written 4 months ago by wjdgmlcjf0

Does view -s actually break read pairs? From Samtools manpage: "Output only a proportion of the input alignments. This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate" It says it never keeps a read but not it's mate, which is a different way of saying that if it keeps a read, it keeps its mate.

ADD REPLYlink written 3 months ago by gk3970

Does view -s actually break read pairs?

I had this issue once, but it could have been with an older samtools version. Thx for bringing this up, I may try samtools again then.

ADD REPLYlink written 3 months ago by Carlo Yague4.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1917 users visited in the last hour