Question: Count of GC in row, but not from N bases
1
gravatar for Korsocius
3.6 years ago by
Korsocius90
Korsocius90 wrote:

Dear all,

I have one problem, it is only one condition for my easy script for counting of GC content per row

   awk 'NR>1{n=length($1); gc=gsub("[gcGC]", "", $1); print gc/n}' $i

How to count length of row without N character.

for example:

Input:  ACAGCTTGCNNNN   => length= 9   Gc content=5/9

format of output is not importatn, only how to count it.

Thanks a lot.

 

count gc bases • 1.3k views
ADD COMMENTlink modified 3.6 years ago by tomc80 • written 3.6 years ago by Korsocius90

I think I could do it by

N_count = awk -F N  '{print NF-1}' file

and this result use in

 awk 'NR>1{n=length($1); gc=gsub("[gcGC]", "", $1); print gc/(n-$N_count}' $i
ADD REPLYlink written 3.6 years ago by Korsocius90
2
gravatar for iraun
3.6 years ago by
iraun3.5k
Norway
iraun3.5k wrote:

This command should work, but it's not in awk, it is in bash.

while read p; do
       len=$(echo $p | sed 's/N//g' | tr -d '\n' | wc -c)
       cnt=$(echo $p | grep -oh 'C\|G\|g\|c' | tr -d '\n' | wc -c)
       gc=$(awk "BEGIN {printf \"%.2f\",${cnt}/${len}}")
       echo -e length:$len --- GC:$gc
done<file

 

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by iraun3.5k

Bash is good too, I solve it in awk with bash together. This result is comfortable but only one thing, there is only rounded to hundredths .Thank you..

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Korsocius90

Glad to help :).

ADD REPLYlink written 3.6 years ago by iraun3.5k
2
gravatar for tomc
3.6 years ago by
tomc80
United States
tomc80 wrote:

Normalized GC content per row sans N

awk '{gsub("N","");t=length();gsub(/[GC]/,"");print int((t-length())/t*100)/100}'

or, assumes sequence symbols are strictly ACTGN.

 
awk '{gsub("N","");t=length();gsub(/[AT]/,"");print int(length()/t*100)/100}'
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by tomc80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1138 users visited in the last hour