Question: Subsampling Bam File With Samtools
18
gravatar for madbessoul
6.1 years ago by
madbessoul190
madbessoul190 wrote:

Hi,

I am trying to subsample from a bam file using the samtools view -s command. This is working when sampling 50% or lower (-s 42.50, 42 being the seed), but anything higher fails (returns an empty file).

He are the exact commands i use

samtools view -s 0.25 -b chr6_all.bam > chr6_25p.sam

Working.

samtools view -s 0.50 -b chr6_all.bam > chr6_50p.sam

Working.

samtools view -s 0.75 -b chr6_all.bam > chr6_75p.sam

Not working.

I also made sure that 49% is working, but 51% is not. Any ideas, suggestions, or is this an intented mechanic ? There doesn't seem to by any documentation about the subsampling parameter in samtools docfile.

Thanks.

bam samtools • 24k views
ADD COMMENTlink modified 10 months ago by ATpoint21k • written 6.1 years ago by madbessoul190
1

Thank you very much, updating to the lastest version right now.

ADD REPLYlink written 6.1 years ago by madbessoul190
1

You could accept one of the answers as the final answer.

ADD REPLYlink written 2.5 years ago by Dataman280
8
gravatar for John Marshall
6.1 years ago by
John Marshall1.7k
Glasgow, Scotland
John Marshall1.7k wrote:

Subsampling not working for fractions above 50% is a known bug in samtools 0.1.18. (See [Samtools-help] Randomized Subsampling Bam File / Subsampling above 50%.)

The bug was fixed in March last year; samtools 0.1.19 contains the corrected version.

ADD COMMENTlink modified 4 months ago by RamRS23k • written 6.1 years ago by John Marshall1.7k
5
gravatar for 14134125465346445
6.1 years ago by
United Kingdom
141341254653464453.5k wrote:

I have also tried sambamba, and found it to be faster in multi-threaded mode compared to samtools 0.1.19

~/src/sambamba/sambamba_v0.3.3 view -h -t $numThreads -s $fractionOfReads -f bam --subsampling-seed=$seed $testBam -o $subsampledTestBam
ADD COMMENTlink modified 4 months ago by RamRS23k • written 6.1 years ago by 141341254653464453.5k
2
gravatar for ATpoint
10 months ago by
ATpoint21k
Germany
ATpoint21k wrote:

A solution with sambamba and GNU parallel, subsampling all BAM files in $(pwd) to a user-defined number of reads, here 50.000.000:

#!/bin/bash

function SubSample {

## see also: http://crazyhottommy.blogspot.com/2016/05/downsampling-for-bam-files-to-certain.html
FACTOR=$(samtools idxstats $1 | cut -f3 | awk -v COUNT=$2 'BEGIN {total=0} {total += $1} END {print COUNT/total}')

if [[ $FACTOR > 1 ]]
  then 
  echo '[ERROR]: Requested number of reads exceeds total read count in' $1 '-- exiting' && exit 1
fi

sambamba view -s $FACTOR -t 2 -f bam -l 5 $1

}

export -f SubSample

ls *.bam | parallel "SubSample {} 50000000 > {.}_subsampled.bam"
ADD COMMENTlink modified 10 months ago • written 10 months ago by ATpoint21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1329 users visited in the last hour