Question: how to adjust BAM header after extracting only mapped reads from BAM files
0
gravatar for Ric
12 weeks ago by
Ric140
Australia
Ric140 wrote:

Hi, How to adjust BAM header after extracting only mapped reads from BAM files to have only the reference ids which are present inside the BAM files?

Thank you in advance.

Michal

samtools bam • 258 views
ADD COMMENTlink modified 7 weeks ago by glihm400 • written 12 weeks ago by Ric140

What do you mean by adjusting header? Do you mean adding a missing header?

ADD REPLYlink written 12 weeks ago by Vijay Lakhujani1.2k

removing the @SQ lines ? that's usually a bad idea

ADD REPLYlink written 12 weeks ago by Pierre Lindenbaum98k

Yes, removing @SQ's which are not contained in the BAM file.

ADD REPLYlink written 12 weeks ago by Ric140
1

ok let's take a step back, why do you want to do this?

ADD REPLYlink written 7 weeks ago by Gabriel R.2.0k

that's usually a bad idea

just curious, why would you want to do this ?

ADD REPLYlink written 12 weeks ago by Pierre Lindenbaum98k

I need to find out which reference was hit i.e. reads have mapped.

ADD REPLYlink written 12 weeks ago by Ric140

I need to find out which reference was hit i.e. reads have mapped.

I still don't get it. If you need to extract the reads , say on chr22, why don't you just use

samtools view in.bam chr22
ADD REPLYlink written 12 weeks ago by Pierre Lindenbaum98k

I need only reference Ids from only mapped reads.

ADD REPLYlink written 12 weeks ago by Ric140
4

I need only reference Ids from only mapped reads.

samtools view -F 4 in.bam |  cut -f 3 | sort | uniq
ADD REPLYlink written 12 weeks ago by Pierre Lindenbaum98k
2
gravatar for glihm
7 weeks ago by
glihm400
France
glihm400 wrote:

Hello Michal,

There is no samtools command to do what you want to achieve.

So, first of all, "adjusting" the headers to only mapped reads is not something required if you want to process the BAM file. I mean, if you have reads mapped on chr1 and chr2, even if chr3 and chr4 still in the headers they will not bother you.

Nevermind, if you still wanting "adjusting" the headers, you will need to write a little script for this. To do so, several options:

  1. Using HTSLIB, the bam_hdr_t structure will allow you to modify what you need. From this, you can easily write a C program to write a new BAM with ONLY mapped reads and their "adjusted" reference.
  2. If you are not familiar with C, you can use Python or an other language to parse the BAM and then check which one of the reference hasn't got reads (using Pysam for instance).
  3. Using BASH commands, using @Pierre Lindenbaum solution. From the command he provided, you will have a list of UNIQUE references with at least 1 mapped read. You can then removed from the header the references not mentioned in the results.

samtools view -F 4 in.bam | cut -f 3 | sort | uniq <- gives you the list of headers with at least 1 read mapped. samtools view -H in.bam > adj.sam <- writes headers in the sam format, you just have to removed bad refs. samtools view -F 4 in.bam >> adj.sam <- writes all the mapped reads in the sam format. samtools view -h -b -S adj.sam > adj.bam <- Get a bam.

This "hand-fix" solution is good if you only have 1 bam to process. Otherwise, I recommend writing a script / little program to do this in an automatized way. ;)

ADD COMMENTlink written 7 weeks ago by glihm400
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1474 users visited in the last hour