Question: Filtering nth length from mapped SAM/BAM file.
0
gravatar for unique379
3.4 years ago by
unique37950
Spain
unique37950 wrote:

Hi guys,

I am trying to filter reads less than 18nt from my mapped BAM file and then want to get back into SAM format but there is problem while converting back. To do so i did...

samtools view mapped.bam | awk '$6 >= 18' > filt.bam

samtools view -h -o out.sam filt.bam


[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "filt.bam".

???

i guess when i filter my reads, it becomes text instead BAM, right ??

help me please.

 

Note: My original SAM had header information.

 

bash rna-seq shell mirnaseq • 3.4k views
ADD COMMENTlink modified 3.4 years ago by Pierre Lindenbaum115k • written 3.4 years ago by unique37950
1

read length or alignment lenth ?

ADD REPLYlink written 3.4 years ago by Pierre Lindenbaum115k

As i am already using mapped.bam so its obvious. its aligned reads. 

ADD REPLYlink written 3.4 years ago by unique37950
1

Some aligners using soft-clipping, which let the read length untouched but the alignment length can be  shorter (see e.g. here)

ADD REPLYlink written 3.4 years ago by michael.ante2.8k
1
gravatar for Pierre Lindenbaum
3.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum115k wrote:

using my tool samjs : https://github.com/lindenb/jvarkit/wiki/SamJS

 

 java -jar distsamjs.jar -e '!record.readUnmappedFlag  && record.cigar.referenceLength  >= 18 ' input.bam | samtools view -Sb -o out.bam -
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Pierre Lindenbaum115k

Dear Pierre,

I have tried your tool to filter mappings with a minimal mapped length. I see that your filtering is based on the mapped length measured on the reference. I have encountered mapping with CIGAR looking like this 106S8M1D21M14S. Based on your command the mapping does not get filtered out when I set the value to >=30. This is correct since you measure by the mapping length calculated from the reference. The mapped read length measured on the reference is 30 (8M+1D+21M). But, if we would count the mapped length from the read itself we would get just 29 (8M+21M). I could imagine an extreme case with special mappings (which would be reported for whatever reason) with a lot of insertions/deletions "closed" between several mapped regions. I assume your filtering would keep such mappings because the mapped length measured by the reference would be still OK. However, I would not consider such mappings as "correct" which I would like to keep. FInaly for my question - Is there an option in your tool to filter by mapped length calculated from the read itself and not from the reference?

ADD REPLYlink written 21 months ago by opplatek30
1
record.getReadLength()>=18
ADD REPLYlink written 21 months ago by Pierre Lindenbaum115k
0
gravatar for michael.ante
3.4 years ago by
michael.ante2.8k
Austria/Vienna
michael.ante2.8k wrote:

It seems, you try to use the cigar field to get the read length. Just have a look at your data:

samtools view mapped.bam | cut -f 6 | head

In case you have a splice-aware mapper, you'll get something like :

46M
94M
10M100N84M
61M
94M
94M

In this case, you have to use a cigar parser, as mentioned e.g. here.

If you have no junctions and no soft-clipping you can go for

samtools view mapped.bam | awk 'length($10) >= 18' > filt.sam

Denote, that you need the header :

samtools view -H mapped.bam > header.txt
cat header.txt filt.sam | samtools view -bSh -o out.bam -

Cheers,

 

Michael

ADD COMMENTlink written 3.4 years ago by michael.ante2.8k

Thank Michael for your elegant answer, but do tell me, filtered read would have no affect in header ??? as you have concatenate the previous header from original SAM into filt.bam ???

ADD REPLYlink written 3.4 years ago by unique37950
1

With a simple samtools view command, you'll skip the header. You can force to print the header with samtools view -h ... . Then, you'll need to change the filtering to something like

samtools view -h mapped.bam | awk '/^@/ || length($10) >= 18' > filt.sam 
ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by michael.ante2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1326 users visited in the last hour