Awk or Shell script in need
4
0
Entering edit mode
5.6 years ago
ThulasiS ▴ 70

Dear Forum Members I have a job to finish. I know it can be done with awk program but I don't have much programming skills. I am still learning awk The job is to extract some lines in a series from a file I have the following e.g. input file blast output

NC_007622|123-456 NC_234 123 568
NC_007622|123-456 NC_546 126 563
NC_007622|123-456 NC_564 582 369
NC_007622|123-456 NC_985 548 367
NC_007622|123-456 NC_758 877 687
NC_007622|841-898 NC_234 456 785
NC_007622|841-898 NC_546 458 798


Required output

NC_007622|123-456
NC_234 123 568
NC_546 126 563
NC_564 582 369
NC_007622|841-898
NC_234 456 785


I need every 7th element of column 1 followed by each line of column 2,3, 4.. Like this till end of file

Any help badly needed Thank you

shell awk • 1.6k views
1
Entering edit mode

I am not giving you the exact answer. Instead I'm directing you to a resource. Just to let you know these problems can also be solved with google. Happy googling :)

How to print every nth line in a file in Linux?

or

extract every nth line from text file unix

0
Entering edit mode

I tried all the possible ways with googling. Stii I couldn't able to write exact script for my problem. Then I posted here.

Thank you

0
Entering edit mode

The question is not clear as you mixed the example with your explanation. Also, what do you mean by 7th element and how does the actual file look like. Awk and cut can be used for column-wise extraction, @venu has already given you the route

0
Entering edit mode

Before posting my input and ouput looks normally like in my file. But after posting it became unclear. Simply, I can explain Suppose input looks like this 1| 25| 368| 398 1| 26| 368| 375 1| 27| 367| 398 1|| 29| 398 347 2| 25 |754 982 what output I need is 1| 25| 368| 398 26| 368 375 27 |367| 398 29| 398| 34 7 2| 25| 754| 982 and so on..

"|" represents different row

0
Entering edit mode

why not just any programming langue like python or perl ?

0
Entering edit mode

It's good practice to show what you tried and what didn't work.

0
Entering edit mode

What I tried is something naive like this awk

'BEGIN {FS=OFS== " "} { 'NR%7==7{ print $1}'}' | awk 'NR%1==1{print$2,$3,$5}' It is printing all required items from column 1 but that is not i required

0
Entering edit mode

You can do it with basic cut and sed - something like below where you replace delimiters and columns "ab". This is no means to test you, but basic scripting questions can be checked at stackoverflow.

cat <(cut -d'space' -fa file | sort -u) <(sed 's/space/tab/' | cut -d'tab' -fb)

It would be nice to show us what you tried and what didn't work while posting the question.

2
Entering edit mode
5.6 years ago
nterhoeven ▴ 120

I would use the following perl one-liner for this:

perl -ane 'BEGIN{$id="";} if($F[0] ne $id){$id=shift(@F); print $id,"\n",join(" ",@F),"\n";}else{shift(@F); print join(" ",@F),"\n";}' filename.txt  Explanation: • The file is read line-wise and each line is split at whitespace • The first column is checked (is it the same than before?) • if yes, the 2nd, 3rd and 4th columns are printed • if no, the 1st column is printed and stored, then the rest is printed in a new line ADD COMMENT 1 Entering edit mode a little bit simpler: perl -lane '$h1 = shift @F; $h1 ne$h2 and print $h1; print "@F";$h2 = $h1' filename.txt  ADD REPLY 0 Entering edit mode even simpler: perl -ape 's/ /\n/;$h and s/\Q$h\E\n//;$h = $F[0]' filename.txt  just learnt that \Q and \E can be used to tell regex to treat a variable as a literal string (the | present in the titles is a regex special character). very convenient if you don' t want to parse your variables when using them inside regex functions. ADD REPLY 0 Entering edit mode Thank you so much nterhoeven The job done in jiffy.. ADD REPLY 2 Entering edit mode 5.6 years ago AWK has arrays for storing groups of related strings or numbers. Just use it this way : awk '{tab[$1]=tab[$1]"\n"$2" "$3" "$4} END {for (i in tab) {print i " " tab[i]} }' test.txt


For each identifier in column $1 create an entry in the array (tab) if absent or concatenate its content to columns 2 to 4. Recall that adding "\n" to the concatenated string help writing the output in different lines. ADD COMMENT 1 Entering edit mode 5.6 years ago It is simple in awk: awk '{print$1, $2,$3, $4,$6, $7,$8, $10,$11, $12}' input_file>output_file Answer is valid only for the data provided initially like NC_007622|123-456 NC_234 123 568 NC_007622|123-456 NC_546 126 563 NC_007622|123-456 NC_564 582 369 than output will be:--> NC_007622|123-456 NC_234 123 568 NC_546 126 563 NC_564 582 369 ADD COMMENT 2 Entering edit mode Based on the posts of other people here I have the impression you are oversimplifying things and your code won't yield the desired result. ADD REPLY 0 Entering edit mode 5.6 years ago 5heikki 10k Something like this. Perhaps your field separator is something other than space though? Also the columns after the else.. awk 'BEGIN{FS=" "}{if(NR==1 || !(NR%7)){print$1}else{print $2,$3,\$4}}' file.txt

0
Entering edit mode

For future ref: This command currently produces following output using example in original post.

NC_007622|123-456
NC_546 126 563
NC_564 582 369
NC_985 548 367
NC_758 877 687
NC_234 456 785
NC_007622|841-898