Filtration of bam file but with header
2
3
Entering edit mode
9.9 years ago
filipzembol ▴ 170

Dear all, I have easy script in bash using awk language. I filtrate bam file by GC content (see at code), but I need to get on output bam file too with header. Could you help me with this? thank you My code:

#!/bin/bash

for i in 54321*.bam
do
  samtools view -h /home/filip/Desktop/Analyza\ NIFTY\ and\ CNVs/NIFTY\ pooling009/$i | awk '{ n=length($10); print gsub(/[GCCgcs]/,"",$10)/n,"\t",$0  }' | awk '($1 <= 0.6 && $1 >= 0.3){print $0}' | awk '!($1="")' | samtools -bS - > z$i;
done
header samtools BAM filtration • 5.2k views
ADD COMMENT
0
Entering edit mode

I think, I could use something like this (|| $1 ~/^@/), but I really do not know where do I incorporate it.

ADD REPLY
3
Entering edit mode
9.9 years ago

As long as you don't have any reads beginning with "@" (and you shouldn't), just add an if(substr($1,1,1) == "@") { print $0} else {...your current code...}. You just want to avoid all of the piping and have a single awk command (unless you want to modify each of the pipes, which would quickly become annoying).

BTW, this might be easier with pysam. Awk is great, but it can get a bit unwieldy with longer commands.

ADD COMMENT
0
Entering edit mode

Thank you, I will try to put it into one awk command, if If I can manage to do it and use your code.

ADD REPLY
0
Entering edit mode

conditioning your code to be executed only on reads can be written like this: if(/^@/){print}else{yourcode}

ADD REPLY
0
Entering edit mode

I am so apologize for disturbing, but if I am trying to put together your code with my code, I failed, could you please help me?

ADD REPLY
3
Entering edit mode
9.9 years ago

you could use your code without including the header in a first samtools run (and you wouldn't have to check whether each line is a header or a read, you'll only be dealing with reads), and then embed the header from the original bam in a subsequent samtools run. something like this:

for bamfile in *bam; do
    samtools view -H $bamfile > temp.sam
    samtools view $bamfile | yourcode >> temp.sam
    samtools view -bS temp.sam > mod_$bamfile
done
ADD COMMENT
0
Entering edit mode

I use it but I receive error log (input bam contain header, I check it):

[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!
[bam_header_read] EOF marker is absent. The input is probably truncated.
ADD REPLY
0
Entering edit mode

that is due to the fact that samtools requires a header in sam format, or the -t option that would also require a tab file. it would be easier then to modify the code as I did in the answer: extract the header into a samfile, edit the reads and append them to the samfile, convert the samfile to bamfile and remove the temporal samfile. this definitely works, although there might be a lighter way that pipes all that information without having to generate that temporal samfile.

ADD REPLY
0
Entering edit mode

thank you for your time and for your help, I cannot finish it, but I try to rewrite code to get the result what I want.

ADD REPLY

Login before adding your answer.

Traffic: 1976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6