Subsampling one bamfile multiple times?
1
0
Entering edit mode
5.1 years ago
wjdgmlcjf • 0

Hi,

I have a bam file and I want to subsample this bam file 10 times with 1x, 2x, 5x coverage each.

Can anyone help me solving this?

I tried to use samtools and sambamba but nothing could make accurate results.

Thank you!

sequencing bam subsample • 3.1k views
ADD COMMENT
2
Entering edit mode

I tried to use samtools and sambamba but nothing could make accurate results.

What do you mean by that ? I used samtools for subsampling and it worked pretty well with the -s option. Of course you need to determine beforehand what fraction of your bam files corresponds to 1x, 2x, 5x coverage.

ADD REPLY
0
Entering edit mode

My major problem is how to subsample one file multiple times, I saw many questions about subsampling multiple files, but I need to subsample one file many times.

For example, I want to generate 1x_subsampled_1.bam, 1x_subsampled_2.bam... 2x_subsampled_1.bam, 2x_subsampled_2.bam and so on.

ADD REPLY
1
Entering edit mode

Hello wjdgmlcjf ,

it is still unclear what's your problem. Just run the subsampling command multiple times with different parameters and your are done, aren't you? Or which point I'm missing?

fin swimmer

ADD REPLY
0
Entering edit mode

I am not sure if this is what you are looking for, but the -s argument does support multiple seed values:

-s FLOAT subsample reads (given INT.FRAC option value, 0.FRAC is the fraction of templates/read pairs to keep; INT part sets seed)

Changing the seed will result in a different (but reproducible using the same seed) set of random values, resulting in different bam files for the same chosen fraction.

Remark that samtools view -s may be absent in older samtools versions.

I hope this helps a bit,

Youri

ADD REPLY
0
Entering edit mode

Yeah that's right, but I want to write in one script, not running command multiple times. Because I want to generate about 50~60 subsampled bam files.

ADD REPLY
1
Entering edit mode

Picard downsample ?

Get 25% of bam file :

picard DownsampleSam INPUT=input.bam OUTPUT=input.25pcReads.bam RANDOM_SEED=50 PROBABILITY=0.25 ALIDATION_STRINGENCY=SILENT
ADD REPLY
1
Entering edit mode
5.1 years ago

Then you should combine samtools with a for loop in a bash script. For instance to subsample multiple files multiple times at multiple depth, you could use this code (untested).

#!/bin/bash

# down-sample multiple bam files (given as input) 100 times for K fractions (1, 5, 10, 25 % in this example) of the original bam files
# This script requires samtools installed and in your PATH.


set -e
if [ $# -eq 0 ]; then
  echo "Usage : down_sample_bam.sh bam_file_1 [bam_file_2, ...]"
else
  for i in $*
    do
    echo ""; echo "bam file : ${i}"; echo "" ; echo "downsampling reads..."
    for k in 1 5 10 25
    do
      echo "${k} %"
      for j in {0..99}
        do
        samtools view -b -s ${j}.${k} ${i} > ${i}_down_${k}_${j}.bam
        done
      done
    done
    echo "done !"
fi
exit 1

PS: note that samtools view -s breaks read pairs. If you have paired-end reads and want to keep your mate pairs together, you might use bedtools bamtobed -bedpe, extract a few random lines (with shuf or rl) then convert back to bam with bedtools bedpetobam.

ADD COMMENT
0
Entering edit mode

Thank you! This is just what I wanted.

I will use only one bam file, thus I will modify a little bit and test.

ADD REPLY
0
Entering edit mode

Does view -s actually break read pairs? From Samtools manpage: "Output only a proportion of the input alignments. This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate" It says it never keeps a read but not it's mate, which is a different way of saying that if it keeps a read, it keeps its mate.

ADD REPLY
0
Entering edit mode

Does view -s actually break read pairs?

I had this issue once, but it could have been with an older samtools version. Thx for bringing this up, I may try samtools again then.

ADD REPLY

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6