Question: Subsampling Bam File With Samtools
17
gravatar for madbessoul
5.6 years ago by
madbessoul180
madbessoul180 wrote:

Hi,

I am trying to subsample from a bam file using the samtools view -s command. This is working when sampling 50% or lower (-s 42.50, 42 being the seed), but anything higher fails (returns an empty file).

He are the exact commands i use

samtools view -s 0.25 -b chr6_all.bam > chr6_25p.sam

Working.

samtools view -s 0.50 -b chr6_all.bam > chr6_50p.sam

Working.

samtools view -s 0.75 -b chr6_all.bam > chr6_75p.sam

Not working.

I also made sure that 49% is working, but 51% is not. Any ideas, suggestions, or is this an intented mechanic ? There doesn't seem to by any documentation about the subsampling parameter in samtools docfile.

Thanks.

bam samtools • 21k views
ADD COMMENTlink modified 4 months ago by ATpoint13k • written 5.6 years ago by madbessoul180
1

Thank you very much, updating to the lastest version right now.

ADD REPLYlink written 5.6 years ago by madbessoul180
1

You could accept one of the answers as the final answer.

ADD REPLYlink written 2.1 years ago by Dataman260
8
gravatar for John Marshall
5.6 years ago by
John Marshall1.5k
Glasgow, Scotland
John Marshall1.5k wrote:

Subsampling not working for fractions above 50% is a known bug in samtools 0.1.18. (See [Samtools-help] Randomized Subsampling Bam File / Subsampling above 50%.)

The bug was fixed in March last year; samtools 0.1.19 contains the corrected version.

ADD COMMENTlink written 5.6 years ago by John Marshall1.5k
5
gravatar for 14134125465346445
5.6 years ago by
United Kingdom
141341254653464453.4k wrote:

I have also tried sambamba, and found it to be faster in multi-threaded mode compared to samtools 0.1.19:

https://github.com/lomereiter/sambamba

~/src/sambamba/sambamba_v0.3.3 view -h -t $numThreads -s $fractionOfReads -f bam --subsampling-seed=$seed $testBam -o $subsampledTestBam
ADD COMMENTlink written 5.6 years ago by 141341254653464453.4k
0
gravatar for ATpoint
4 months ago by
ATpoint13k
Germany
ATpoint13k wrote:

A solution with sambamba and GNU parallel, subsampling all BAM files in $(pwd) to a user-defined number of reads, here 50.000.000:

#!/bin/bash

function SubSample {

## see also: http://crazyhottommy.blogspot.com/2016/05/downsampling-for-bam-files-to-certain.html
FACTOR=$(samtools idxstats $1 | cut -f3 | awk -v COUNT=$2 'BEGIN {total=0} {total += $1} END {print COUNT/total}')

if [[ $FACTOR > 1 ]]
  then 
  echo '[ERROR]: Requested number of reads exceeds total read count in' $1 '-- exiting' && exit 1
fi

sambamba view -s $FACTOR -t 2 -f bam -l 5 $1

}

export -f SubSample

ls *.bam | parallel "SubSample {} 50000000 > {.}_subsampled.bam"
ADD COMMENTlink modified 4 months ago • written 4 months ago by ATpoint13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1383 users visited in the last hour