Question: Help in Python and AWK: Printing absolute difference value between consecutive rows values, if column1 value is same
0
gravatar for tinkuhim007
9 weeks ago by
tinkuhim0070 wrote:

Need help in Python and AWK: Printing least difference value between consecutive rows values, if column1 value is same Being a biologist, not able to do it. Your help is highly appreciated.

Input File: <tab delimited file>

Chr1     A     10
Chr1     B     13
Chr1     C     12
Chr2     D     12
Chr2     E     14
Chr2     F     11

Description: In column 1, chr1 == chr1; so A-B, A-C, and B-C. Further, Chr2==Chr2 so same iteration will be followed.

Output File 1: <tab delimited file>
A     B     3
A     C     2
B     C     1
D     E     2
D     F     1
E     F     3
awk python • 239 views
ADD COMMENTlink modified 8 weeks ago by Kevin Blighe33k • written 9 weeks ago by tinkuhim0070

Your output assumes each member of column 2 is unique to the chromosome. I'm assuming column 2 has genes and column 3 has some sort of coordinates and you're looking to list intergenic distances of some sort?

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by RamRS19k

Be careful what you wish for, because you may get it: the number of differences you want is given by the formula

n! / ( k! * ( n - k )! )

Where nis the number of elements ( A, B, C, ... ) per chromosome, and k = 2. For 10 elements on a chromosome, you get 45 differences, for 100 elements per chromosome, you will get 4950 differences, and so on. My cell phone calculator cannot display the results for 1000 elements, as the number is already too big for it.

ADD REPLYlink written 9 weeks ago by h.mon22k

It's 499500 combinations for 1000 elements [C(n,2)]. I think we have ourselves an XY problem here.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by RamRS19k
0
gravatar for Kevin Blighe
8 weeks ago by
Kevin Blighe33k
Republic of Ireland
Kevin Blighe33k wrote:

Hello,

Here is an awk command that will do it for you per chromosome, tinkuhim007:

cat test
Chr1    A   10
Chr1    B   13
Chr1    C   12
Chr2    D   12
Chr2    E   14
Chr2    F   11

awk '{arr[$1","$2]=$3} END { \
    for (char1 in arr) { \
        for (char2 in arr) { \
            split(char1, charArr1, ",") ;
            split(char2, charArr2, ",");
            if ((char1 != char2) && (charArr1[1] == charArr2[1])) { \
              print charArr1[1]"\t"charArr1[2]"\t"charArr2[2]"\t"arr[char2]-arr[char1]}}}}' test

Chr1    A   B   3
Chr1    A   C   2
Chr1    B   A   -3
Chr1    B   C   -1
Chr1    C   A   -2
Chr1    C   B   1
Chr2    D   E   2
Chr2    D   F   -1
Chr2    E   D   -2
Chr2    E   F   -3
Chr2    F   D   1
Chr2    F   E   3

You didn't indicate a rule for order of subtraction. If you don't want negative values anywhere in output, you can just add an extra if statement:

awk '{arr[$1","$2]=$3} END { \
    for (char1 in arr) { \
        for (char2 in arr) { \
            split(char1, charArr1, ",") ;
            split(char2, charArr2, ",");
            if ((char1 != char2) && (charArr1[1] == charArr2[1])) { \
              result = arr[char2]-arr[char1] ;
              if (result > 0) { \
                  print charArr1[1]"\t"charArr1[2]"\t"charArr2[2]"\t"result}}}}}' test

Chr1    A   B   3
Chr1    A   C   2
Chr1    C   B   1
Chr2    D   E   2
Chr2    F   D   1
Chr2    F   E   3

If you want to understand how this is working, then let me know.

Kevin

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Kevin Blighe33k

Dear Kevin
Thanks a lot for helping me. I very new to the awk. It will be great if you could explain me line wise the code. And one more thing is there if you could kindly add in this code, only least distance value appears between two points in the second output file.

Output file <Tab delimited>
Chr1    A     C     2
Chr1    B     C     1
Chr1    C     B     1
Chr2    D     E     2
Chr2    E     D     2
Chr2    F     E     3

Thanks a lot.

ADD REPLYlink modified 6 weeks ago by RamRS19k • written 6 weeks ago by tinkuhim0070
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1642 users visited in the last hour