Determining length of numerical ranges in file by perl/awk
2
0
Entering edit mode
7.5 years ago
User 6777 ▴ 20

Hi all,

I am now learning awk but get stuck on the following problem.

I have a file (input.txt) with various numerical ranges (before it, there is an id and :) in each line such as:

NP_416485.4: 4-5, 113-114, 395-399, 657-666, 671-675, 844-880, 889-889, 891-895, 963-966, 970-970, 991-992, 1126-1235
NP_417679.2: 309-413, 418-421, 441-442, 444-445, 447-481, 939-941, 943-984
NP_418770.2: 264-265, 267-272, 276-277, 287-288
NP_415931.4: 32-33, 73-75, 387-388, 394-396, 531-634

Now I want to go though each id and print only those ranges which are >=30. For the above input, the output should be:

NP_417679.2: 309-413, 447-481, 943-984
NP_415931.4: 531-634

I have tried this awk script:

awk -F"-" "{if(($2-$1)>=30) print $_}" input.txt

But it prints all the input ranges. Please suggest.

ps. I can also use any perl alternative

awk perl • 1.5k views
ADD COMMENT
2
Entering edit mode
7.5 years ago

If you pre-process your input to add a trailing comma, you can use the space as a delimiter:

$ awk -F' ' '{ $0 = $0","; h = $1; f = ""; for (i = 2; i <= NF; i++) { r = substr($i, 0, length($i) - 1); split(r, re, "-"); if (re[2] - re[1] >= 30) { f = f" "$i; } } if (length(f) > 0) { print h" "substr(f, 2, length(f) - 2); } }' input.txt
NP_416485.4: 844-880, 1126-1235
NP_417679.2: 309-413, 447-481, 943-984
NP_415931.4: 531-634
ADD COMMENT
2
Entering edit mode
7.5 years ago

a quick perl alternative:

perl -ane '$s = "";
foreach (@F) { /(\d+)-(\d+)/ and $2 - $1 >= 30 and $s .= " $1-$2" }
print "$F[0]$s\n" if $s' input.txt

the -a option splits input by \s, it loops through all fields looking for valid intervals and appending them to an $s variable, which is printed at the end if not empty.

you can play around with the loop rationale in mind:

perl -ne '$s = ""; /^(\S+:)/ and $S = $1;
while (/(\d+)-(\d+)/g) { $s .= " $1-$2" if $2 - $1 >= 30 }
print "$S$s\n" if $s' input.txt
ADD COMMENT
0
Entering edit mode

sorry for my ignorance, is this a perl one liner? I'm on a windows machine and it shows error like:

>perl -ane '$s = "";
Can't find string terminator "'" anywhere before EOF at -e line 1.

C:\Users\cdutta\Desktop\x\New folder>foreach (@F) { /(\d+)-(\d+)/ and $2 - $1 >=
 30 and $s .= " $1-$2" }
'foreach' is not recognized as an internal or external command,
operable program or batch file.
ADD REPLY
0
Entering edit mode

yes. I'm not sure how command line perl works on Windows, but you could try wrapping it all into a single command line before pasting it into the Windows console.

perl -ane '$s = ""; foreach (@F) { /(\d+)-(\d+)/ and $2 - $1 >= 30 and $s .= " $1-$2" } print "$F[0]$s\n" if $s' input.txt
ADD REPLY
0
Entering edit mode

aaaam.. currently i'm trying it sir.. but certainly i'm missing something

ADD REPLY
0
Entering edit mode

thanks for your answer

ADD REPLY
0
Entering edit mode

this error:

Can't find string terminator '"' anywhere before EOF at -e line 1.

ADD REPLY
0
Entering edit mode

as I say, I don't know how command line perl works on Windows. you could try to save the code as splitter.pl and then run it perl splitter.pl

ADD REPLY

Login before adding your answer.

Traffic: 2728 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6