Question: Help in Python and AWK: Printing absolute difference value between consecutive rows values, if column1 value is same
0
gravatar for tinkuhim007
4 days ago by
tinkuhim0070 wrote:

Need help in Python and AWK: Printing least difference value between consecutive rows values, if column1 value is same Being a biologist, not able to do it. Your help is highly appreciated.

Input File: <tab delimited file>

Chr1     A     10
Chr1     B     13
Chr1     C     12
Chr2     D     12
Chr2     E     14
Chr2     F     11

Description: In column 1, chr1 == chr1; so A-B, A-C, and B-C. Further, Chr2==Chr2 so same iteration will be followed.

Output File 1: <tab delimited file>
A     B     3
A     C     2
B     C     1
D     E     2
D     F     1
E     F     3
awk python • 117 views
ADD COMMENTlink modified 4 days ago by genomax57k • written 4 days ago by tinkuhim0070

Your output assumes each member of column 2 is unique to the chromosome. I'm assuming column 2 has genes and column 3 has some sort of coordinates and you're looking to list intergenic distances of some sort?

ADD REPLYlink modified 4 days ago • written 4 days ago by RamRS18k

Be careful what you wish for, because you may get it: the number of differences you want is given by the formula

n! / ( k! * ( n - k )! )

Where nis the number of elements ( A, B, C, ... ) per chromosome, and k = 2. For 10 elements on a chromosome, you get 45 differences, for 100 elements per chromosome, you will get 4950 differences, and so on. My cell phone calculator cannot display the results for 1000 elements, as the number is already too big for it.

ADD REPLYlink written 4 days ago by h.mon20k

It's 499500 combinations for 1000 elements [C(n,2)]. I think we have ourselves an XY problem here.

ADD REPLYlink modified 4 days ago • written 4 days ago by RamRS18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1682 users visited in the last hour