Entering edit mode
8.4 years ago
nathaydania
•
0
Hi everyone,
I'm learning bioinformatics and I need some help with a program in C.
The program is aimed at calculating a Khi 2 for a sequence, to make a comparison between the percentage of GC observed and the percentage of GC expected.
Here are the functions I've made:
//to count how many G or C there are in the sequence
int countGCtotal(char seq[], int lg)
{
int I = 0;
int count_GCtotal = 0;
do
{
if ((seq[i]=='G') || (seq[i]=='C'))
{
count_GCtotal += 1;
}
i++;
}while(i<lg);
return (count_GCtotal);
}
//to count G and C with a step of 3 nucleotides
int countGC3by3(char seq[], int lgSeq)
{
int i, compteur_GC3en3=0;
for(i=2; i<lgSeq; i+=3) /*phase 1*/
{
if ((seq[i]=='G') || (seq[i]=='C'))
{
count_GC3by3 +=1;
}
}
return count_GC3by3;
}
// to calculate the Khi 2
float calcKhi2(char seq[], int lg)
{
float khi2Calc=0.0, GC=0.0;
int numberGC3by3=0, numberGCtotal=0;
numberGC3by3 = countGC3by3(seq, lg);
numberGCtotal = countGCtotal(seq, lg);
GC=(float)numberGCtotal/3;
khi2Calc = (float) (pow((float) numberGC3by3-GC,2)/GC);
}
So the Khi2 is calculated for the first position (phase 1).
My problem is: how to calculate the Khi 2 including phases 2 and 3?
How to calculate the number of GC 3 by 3 in phase 2 and 3 (using the function countGC3by3 - without changing this function)?
Thanks a lot for your help ! :)