how to adjust BAM header after extracting only mapped reads from BAM files
1
1
Entering edit mode
6.8 years ago
Ric ▴ 430

Hi, How to adjust BAM header after extracting only mapped reads from BAM files to have only the reference ids which are present inside the BAM files?

Thank you in advance.

Michal

bam samtools • 4.2k views
ADD COMMENT
0
Entering edit mode

What do you mean by adjusting header? Do you mean adding a missing header?

ADD REPLY
0
Entering edit mode

removing the @SQ lines ? that's usually a bad idea

ADD REPLY
0
Entering edit mode

Yes, removing @SQ's which are not contained in the BAM file.

ADD REPLY
1
Entering edit mode

ok let's take a step back, why do you want to do this?

ADD REPLY
0
Entering edit mode

that's usually a bad idea

just curious, why would you want to do this ?

ADD REPLY
0
Entering edit mode

I need to find out which reference was hit i.e. reads have mapped.

ADD REPLY
0
Entering edit mode

I need to find out which reference was hit i.e. reads have mapped.

I still don't get it. If you need to extract the reads , say on chr22, why don't you just use

samtools view in.bam chr22
ADD REPLY
0
Entering edit mode

I need only reference Ids from only mapped reads.

ADD REPLY
4
Entering edit mode

I need only reference Ids from only mapped reads.

samtools view -F 4 in.bam |  cut -f 3 | sort | uniq
ADD REPLY
3
Entering edit mode
6.7 years ago
glihm ▴ 660

Hello Michal,

There is no samtools command to do what you want to achieve.

So, first of all, "adjusting" the headers to only mapped reads is not something required if you want to process the BAM file. I mean, if you have reads mapped on chr1 and chr2, even if chr3 and chr4 still in the headers they will not bother you.

Nevermind, if you still wanting "adjusting" the headers, you will need to write a little script for this. To do so, several options:

  1. Using HTSLIB, the bam_hdr_t structure will allow you to modify what you need. From this, you can easily write a C program to write a new BAM with ONLY mapped reads and their "adjusted" reference.
  2. If you are not familiar with C, you can use Python or an other language to parse the BAM and then check which one of the reference hasn't got reads (using Pysam for instance).
  3. Using BASH commands, using @Pierre Lindenbaum solution. From the command he provided, you will have a list of UNIQUE references with at least 1 mapped read. You can then removed from the header the references not mentioned in the results.

samtools view -F 4 in.bam | cut -f 3 | sort | uniq <- gives you the list of headers with at least 1 read mapped. samtools view -H in.bam > adj.sam <- writes headers in the sam format, you just have to removed bad refs. samtools view -F 4 in.bam >> adj.sam <- writes all the mapped reads in the sam format. samtools view -h -b -S adj.sam > adj.bam <- Get a bam.

This "hand-fix" solution is good if you only have 1 bam to process. Otherwise, I recommend writing a script / little program to do this in an automatized way. ;)

ADD COMMENT

Login before adding your answer.

Traffic: 2916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6