Hi all,
I need to calculate percentage of blastp hits from a few genome as formula below:
[(C1 + C2)/(T1 + T2)] * 100
C1 and C2 is the number of blastp hits of two genomes against each other, for example genome A against genome B represent C1 and genome B against genome A represent C2. T1 and T2 is the total number of proteins in the two genomes being compared.
Now, I have a .txt file consists total number of protein for all my genome of interest:
A:1234 B:1234 C:1234...
I also have another .txt file consists of blastp result for each other genome:
A_B 123, B_A 123, A_C 123, B_C 123..
Is it possible to have a command that is able to calculate the percentage of A_B, A_C and B_C based on formula above?
I apologise if my question is confusing and please tell me if any part of my question is unclear.
a single (linux) cmdline will be difficult I guess. Are you familiar with any programming/scripting language? Writing a small script will have that processed quickly.
Hi, I am not familiar with any programming/scripting language. I am actually kind of new in this field.