Subseq bam file
1
0
Entering edit mode
19 months ago
shinyjj ▴ 50

Hi Biostars,

Does anyone know how to subsample read from a bam file? The below command gives the read number of this bam file. I want to get about 100,000,000 read out of 122,441,229 read.

enter image description here

seqtk provides similar function with fastq file like the command below. I was wondering if I can find a bam file version of seqtk.

./seqtk sample -s101 /data/long_read/lr_consoritum/pcb/ENCFF563QZR.fastq 1844630 > ENCFF563QZR_sub.fq
bam subseq • 761 views
ADD COMMENT
0
Entering edit mode

You can probably use BBTools for this as well (untested) :

reformat.sh -Xmx4g in=your.bam out=read.fastq.gz reads=100000000 primaryonly=t

If you have paired-end data then

reformat.sh -Xmx4g in=your.bam out1=read1.fastq.gz out=read1.fastq.gz reads=100000000 primaryonly=t 
ADD REPLY
2
Entering edit mode
19 months ago
 samtools view  --subsample 0.816718 in.bam
ADD COMMENT
0
Entering edit mode

AWESOME THANKS

ADD REPLY

Login before adding your answer.

Traffic: 2624 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6