Gc Content From Bam
2
3
Entering edit mode
10.2 years ago
filipzembol ▴ 180

Dear all, I have one problem, I would like to count the content of GC nucleotids from bam file for each read. I think I have to count it from $10, but I do not know how? Could you help me please? Thank you

gc bam awk perl • 7.0k views
ADD COMMENT
4
Entering edit mode
10.2 years ago
awk '{ n=length($10); print $10, gsub(/[GCCgcs]/,"",$10)/n;}' your.sam

ADD COMMENT
0
Entering edit mode

I am not sure if it is good, because every time I receive the error awk: (FILENAME=filip.sam FNR=1) fatal: division by zero attempted, than there is the problem with "n" or I do not know.

ADD REPLY
0
Entering edit mode

I tested my awk script without the sam header.

ADD REPLY
0
Entering edit mode

Oh ok, because I use to convert samtools view -h (with header). Without header is working, than thank you so much for your help.

ADD REPLY
2
Entering edit mode
10.2 years ago
hardingnj ▴ 210
samtools view [bam] | perl -lane '$N =()= $F[9] =~ /GC/gi; print $N;'

The odd looking =()= operator instantly converts the match from array to scalar context.

Perl one liner 'a' splits by tabs, 'l' prints a new line, 'n' operates on all lines and 'e' indicates expression.

ADD COMMENT
0
Entering edit mode

An alternate Perl part:

perl -p -lae '$_ =()= $F[9] =~ /GC/gi'

ADD REPLY
0
Entering edit mode

I am not sure, but I try the perl code and I recieve different number from the awk code and awk code has a right numbers of GC count (I did not use /n)..

ADD REPLY

Login before adding your answer.

Traffic: 3000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6