Question: Subsampling Bam File With Samtools
21
gravatar for madbessoul
7.6 years ago by
madbessoul220
madbessoul220 wrote:

Hi,

I am trying to subsample from a bam file using the samtools view -s command. This is working when sampling 50% or lower (-s 42.50, 42 being the seed), but anything higher fails (returns an empty file).

He are the exact commands i use

samtools view -s 0.25 -b chr6_all.bam > chr6_25p.sam

Working.

samtools view -s 0.50 -b chr6_all.bam > chr6_50p.sam

Working.

samtools view -s 0.75 -b chr6_all.bam > chr6_75p.sam

Not working.

I also made sure that 49% is working, but 51% is not. Any ideas, suggestions, or is this an intented mechanic ? There doesn't seem to by any documentation about the subsampling parameter in samtools docfile.

Thanks.

bam samtools • 34k views
ADD COMMENTlink modified 2.4 years ago by ATpoint46k • written 7.6 years ago by madbessoul220
1

Thank you very much, updating to the lastest version right now.

ADD REPLYlink written 7.6 years ago by madbessoul220
1

You could accept one of the answers as the final answer.

ADD REPLYlink written 4.1 years ago by Dataman330
9
gravatar for John Marshall
7.6 years ago by
John Marshall2.2k
Glasgow, Scotland
John Marshall2.2k wrote:

Subsampling not working for fractions above 50% is a known bug in samtools 0.1.18. (See [Samtools-help] Randomized Subsampling Bam File / Subsampling above 50%.)

The bug was fixed in March last year; samtools 0.1.19 contains the corrected version.

ADD COMMENTlink modified 23 months ago by Ram32k • written 7.6 years ago by John Marshall2.2k
6
gravatar for ATpoint
2.4 years ago by
ATpoint46k
ATpoint46k wrote:

A solution with sambamba and GNU parallel, subsampling all BAM files in $(pwd) to a user-defined number of reads, here 50.000.000:

#!/bin/bash

function SubSample {

## see also: http://crazyhottommy.blogspot.com/2016/05/downsampling-for-bam-files-to-certain.html
FACTOR=$(samtools idxstats $1 | cut -f3 | awk -v COUNT=$2 'BEGIN {total=0} {total += $1} END {print COUNT/total}')

if [[ $FACTOR > 1 ]]
  then 
  echo '[ERROR]: Requested number of reads exceeds total read count in' $1 '-- exiting' && exit 1
fi

sambamba view -s $FACTOR -t 2 -f bam -l 5 $1

}

export -f SubSample

ls *.bam | parallel "SubSample {} 50000000 > {.}_subsampled.bam"
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by ATpoint46k
5
gravatar for 14134125465346445
7.6 years ago by
United Kingdom
141341254653464453.5k wrote:

I have also tried sambamba, and found it to be faster in multi-threaded mode compared to samtools 0.1.19

~/src/sambamba/sambamba_v0.3.3 view -h -t $numThreads -s $fractionOfReads -f bam --subsampling-seed=$seed $testBam -o $subsampledTestBam
ADD COMMENTlink modified 23 months ago by Ram32k • written 7.6 years ago by 141341254653464453.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2560 users visited in the last hour
_