Check if BAM is derived from pair-end or single-end reads?
3
8
Entering edit mode
8.7 years ago
medranom38 ▴ 80

I'm automating a pipeline and currently a user inputs as a parameter if the input BAM is from pair-end or single-end reads. I want to automatically check this. How can do I this?

BAM Pair-End Single-End • 26k views
ADD COMMENT
1
Entering edit mode

Thanks to Pierre's suggestion, I also got to understand this issue.

This link provides additional useful information, especially mentioned about flags. If in.bam is made of totally single end reads, following command probably returns 0. The option '-f 1' can be substituted for '-f 0x1' in hexadecimal.

samtools view -c -f 1 in.bam
ADD REPLY
0
Entering edit mode

Hello Hakubi and welcome to biostars,

your post is better placed as a comment. To do this please use the ADD COMMENT button below the post you like to reply to. Answers should only be used to respond to the original question at the top of this page and should answer the question as best as possible. Links to pages that contain further information are always welcome as a comment.

So I moved your post to a comment. As you can see this is not perfect, as your reply would be better match as a comment on Pierres answer.

Happy writing.

fin swimmer

ADD REPLY
1
Entering edit mode

Hello fin swimmer

Thank you for correcting my mistake. I make sense of good manners.

Hakubi

ADD REPLY
17
Entering edit mode
8.7 years ago

Use the sam flag and samtools view -c -f 1 in.bam to test if your bam contains paired reads. ( 1= "read paired" See here )

ADD COMMENT
0
Entering edit mode

Thanks for the answer! If speed is a concern, one may choose to count only the first 1000 reads. { samtools view -H in.bam ; samtools view in.bam | head -n1000; } | samtools view -c -f 1

ADD REPLY
0
Entering edit mode

Thanks. I expect the command samtools view -c -f 1 in.bam returns 0 (single-end) or 1 (pair-end), however, it returned "49423000" in my case (attached below). Is there something wrong?

samtools view -c -f 1 Sorted_1-D2-1.bam

49423000

ADD REPLY
1
Entering edit mode

The command you used counts (with the option -c) the reads that fit the SAM flag 1 (see -f option), so the paired reads. It tests this condition for each read, not for the sample as a whole. You ended up with 49423000, meaning that samtools counted 49423000 paired reads using the input bam file you supplied it with.

ADD REPLY
5
Entering edit mode
8.4 years ago
ATpoint 85k

You may also use samtools view -H. It will print you the header of the bam, including the command that was used for the alignment. Usually, paired-end data are separated into two fastq files, one for forward/reverse reads. Hence, if the alignment command shows two fastq files to be used, it was a paired-end alignment.

An example where BWA mem was used:

samtools view -H my.bam

@PG ID:bwa  PN:bwa  VN:0.7.12-r1044 CL:/opt/bwa-0.7.12/bwa mem -M -t 24 /Genomes/hg19/hg19.fasta in_1.fastq in_2.fastq
-----------------------------------------------------------------------------------------------------^---------^

where the general BWA mem paired-end command is (similar in Bowtie etc.):

./bwa mem -M -t *threads* indexedRefGenome.fasta in_1.fastq in_2.fastq > out.sam
-----------------------------------------------------^---------^
ADD COMMENT
2
Entering edit mode
8.4 years ago
John 13k

Both methods suggested by Pierre and Alexander will work great for 99.99% of cases, but things get a little messy if you want to detect single-end-reads in a mixed paired & single BAM file, or singletons in a previously paired-end BAM file.

Detecting singles in a mixed file is easy, just look for samtools view -c -F 1 in.bam

Detecting singletons is a bit trickier though... since the filtering step might have just removed the read and left the mate totally unaffected.

Perhaps you could use the read names (although people have a habit of putting junk like /1 and /2 on the end of the read names)

Or maybe one of Pierre's tools can detect orphaned singletons :)

Alexander's method would solve this very quickly and easily if all BAM files coming in are totally under your pipeline's control (you mapped/filtered them yourself at some point). But external "mystery BAMs" may have issues.

ADD COMMENT

Login before adding your answer.

Traffic: 831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6