Question: Filtration of bam file but with header
1
gravatar for filipzembol
4.9 years ago by
filipzembol100
filipzembol100 wrote:

Dear all, I have easy script in bash using awk language. I filtrate bam file by GC content (see at code), but I need to get on output bam file too with header. Could you help me with this? thank you My code:

 

#!/bin/bash

for i in 54321*.bam
do

samtools view -h /home/filip/Desktop/Analyza\ NIFTY\ and\ CNVs/NIFTY\ pooling009/$i | awk '{ n=length($10); print gsub(/[GCCgcs]/,"",$10)/n,"\t",$0  }' | awk '($1 <= 0.6 && $1 >= 0.3){print $0}' | awk '!($1="")' | samtools -bS - > z$i;

done

 

bam samtools header filtration • 2.3k views
ADD COMMENTlink modified 4.9 years ago by Jorge Amigo11k • written 4.9 years ago by filipzembol100

I think, i could use something like this (|| $1 ~/^@/)  , but I really do not know where do I incorporate it.

ADD REPLYlink written 4.9 years ago by filipzembol100
2
gravatar for Devon Ryan
4.9 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

As long as you don't have any reads beginning with "@" (and you shouldn't), just add an if(substr($1,1,1) == "@") { print $0} else {...your current code...}. You just want to avoid all of the piping and have a single awk command (unless you want to modify each of the pipes, which would quickly become annoying).

BTW, this might be easier with pysam. Awk is great, but it can get a bit unwieldy with longer commands.

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Devon Ryan89k

Thank you, I will try to put it into one awk command, if If I can manage to do it and use your code.

ADD REPLYlink written 4.9 years ago by filipzembol100

conditioning your code to be executed only on reads can be written like this: if(/^@/){print}else{yourcode}

ADD REPLYlink written 4.9 years ago by Jorge Amigo11k

I am so apologize for disturbing, but if I am trying to put together your code with my code, I failed, could you please help me?

ADD REPLYlink written 4.8 years ago by filipzembol100
2
gravatar for Jorge Amigo
4.9 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

you could use your code without including the header in a first samtools run (and you wouldn't have to check whether each line is a header or a read, you'll only be dealing with reads), and then embed the header from the original bam in a subsequent samtools run. something like this:

for bamfile in *bam; do
samtools view -H $bamfile > temp.sam

samtools view $bamfile | yourcode >> temp.sam
samtools view -bS temp.sam > mod_$bamfile
done

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Jorge Amigo11k

I use it but i receive error log (input bam contain header, i chceck it) :

[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!
[bam_header_read] EOF marker is absent. The input is probably truncated.

 

ADD REPLYlink written 4.9 years ago by filipzembol100

that is due to the fact that samtools requires a header in sam format, or the -t option that would also require a tab file. it would be easier then to modify the code as I did in the answer: extract the header into a samfile, edit the reads and append them to the samfile, convert the samfile to bamfile and remove the temporal samfile. this definitely works, although there might be a lighter way that pipes all that information without having to generate that temporal samfile.

ADD REPLYlink written 4.9 years ago by Jorge Amigo11k

thank you for your time and for your help, I cannot finish it, but I try to rewrite code to get the result what I want.

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by filipzembol100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1136 users visited in the last hour