Question: Check if BAM is derived from pair-end or single-end reads?
2
gravatar for medranom38
23 months ago by
medranom3820
Cambridge, MA, USA
medranom3820 wrote:

I'm automating a pipeline and currently a user inputs as a parameter if the input BAM is from pair-end or single-end reads. I want to automatically check this. How can do I this?

single-end pair-end bam • 4.4k views
ADD COMMENTlink modified 19 months ago by John12k • written 23 months ago by medranom3820
8
gravatar for Pierre Lindenbaum
23 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum103k wrote:

use the sam flag and 'samtools view -c -f 1 in.bam ' to test if your bam contains paired reads. ( 1= "read paired" http://broadinstitute.github.io/picard/explain-flags.html )

ADD COMMENTlink written 23 months ago by Pierre Lindenbaum103k
1
gravatar for ATPoint
19 months ago by
ATPoint2.6k
Germany
ATPoint2.6k wrote:

You may also use samtools view -H. It will print you the header of the bam, including the command that was used for the alignment. Usually, paired-end data are separated into two fastq files, one for forward/reverse reads. Hence, if the alignment command shows two fastq files to be used, it was a paired-end alignment.

An example where BWA mem was used:

samtools view -H my.bam

@PG ID:bwa PN:bwa VN:0.7.12-r1044 CL:/opt/bwa-0.7.12/bwa mem -M -t 24 /Genomes/hg19/hg19.fasta in_1.fastq in_2.fastq

where the general BWA mem paired-end command is (similar in Bowtie etc.):

./bwa mem -M -t threads indexedRefGenome.fasta in_1.fastq in_2.fastq > out.sam

ADD COMMENTlink written 19 months ago by ATPoint2.6k
0
gravatar for John
19 months ago by
John12k
Germany
John12k wrote:

Both methods suggested by Pierre and Alexander will work great for 99.99% of cases, but things get a little messy if you want to detect single-end-reads in a mixed paired & single BAM file, or singletons in a previously paired-end BAM file.

Detecting singles in a mixed file is easy, just look for 'samtools view -c -F 1 in.bam'

Detecting singletons is a bit trickier though... since the filtering step might have just removed the read and left the mate totally unaffected. Perhaps you could use the read names (although people have a habit of putting junk like /1 and /2 on the end of the read names) Or maybe one of Pierre's tools can detect orphaned singletons :)

Alexander's method would solve this very quickly and easily if all BAM files coming in are totally under your pipeline's control (you mapped/filtered them yourself at some point). But external "mystery BAMs" may have issues.

ADD COMMENTlink modified 19 months ago • written 19 months ago by John12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 616 users visited in the last hour