Question

Awk or Shell script in need

0

Entering edit mode

7.5 years ago

ThulasiS ▴ 90

Dear Forum Members I have a job to finish. I know it can be done with awk program but I don't have much programming skills. I am still learning awk The job is to extract some lines in a series from a file I have the following e.g. input file blast output

NC_007622|123-456 NC_234 123 568
NC_007622|123-456 NC_546 126 563
NC_007622|123-456 NC_564 582 369
NC_007622|123-456 NC_985 548 367
NC_007622|123-456 NC_758 877 687
NC_007622|841-898 NC_234 456 785
NC_007622|841-898 NC_546 458 798

Required output

NC_007622|123-456
NC_234 123 568
NC_546 126 563
NC_564 582 369
NC_007622|841-898
NC_234 456 785

I need every 7th element of column 1 followed by each line of column 2,3, 4.. Like this till end of file

Any help badly needed Thank you

shell awk • 2.5k views

ADD COMMENT • link updated 7.5 years ago by khalid.belkhir ▴ 40 • written 7.5 years ago by ThulasiS ▴ 90

1

Entering edit mode

I am not giving you the exact answer. Instead I'm directing you to a resource. Just to let you know these problems can also be solved with google. Happy googling :)

How to print every nth line in a file in Linux?

or

extract every nth line from text file unix

ADD REPLY • link 7.5 years ago by venu 7.1k

0

Entering edit mode

I tried all the possible ways with googling. Stii I couldn't able to write exact script for my problem. Then I posted here.

Thank you

ADD REPLY • link 7.5 years ago by ThulasiS ▴ 90

0

Entering edit mode

The question is not clear as you mixed the example with your explanation. Also, what do you mean by 7th element and how does the actual file look like. Awk and cut can be used for column-wise extraction, @venu has already given you the route

ADD REPLY • link 7.5 years ago by Rohit ★ 1.5k

0

Entering edit mode

Before posting my input and ouput looks normally like in my file. But after posting it became unclear. Simply, I can explain Suppose input looks like this 1| 25| 368| 398 1| 26| 368| 375 1| 27| 367| 398 1|| 29| 398 347 2| 25 |754 982 what output I need is 1| 25| 368| 398 26| 368 375 27 |367| 398 29| 398| 34 7 2| 25| 754| 982 and so on..

"|" represents different row

ADD REPLY • link 7.5 years ago by ThulasiS ▴ 90

0

Entering edit mode

why not just any programming langue like python or perl ?

ADD REPLY • link 7.5 years ago by Medhat 9.7k

0

Entering edit mode

I modified your question for readability.

It's good practice to show what you tried and what didn't work.

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

0

Entering edit mode

What I tried is something naive like this awk

'BEGIN {FS=OFS== " "} { 'NR%7==7{ print $1}'}' | awk 'NR%1==1{print $2,$3,$5}' It is printing all required items from column 1 but that is not i required

ADD REPLY • link 7.5 years ago by ThulasiS ▴ 90

0

Entering edit mode

You can do it with basic cut and sed - something like below where you replace delimiters and columns "ab". This is no means to test you, but basic scripting questions can be checked at stackoverflow.

cat <(cut -d'space' -fa file | sort -u) <(sed 's/space/tab/' | cut -d'tab' -fb)

It would be nice to show us what you tried and what didn't work while posting the question.

ADD REPLY • link 7.5 years ago by Rohit ★ 1.5k

2

Entering edit mode

7.5 years ago

khalid.belkhir ▴ 40

AWK has arrays for storing groups of related strings or numbers. Just use it this way :

awk '{tab[$1]=tab[$1]"\n"$2" "$3" "$4} END {for (i in tab) {print i " " tab[i]} }' test.txt

For each identifier in column $1 create an entry in the array (tab) if absent or concatenate its content to columns 2 to 4. Recall that adding "\n" to the concatenated string help writing the output in different lines.

ADD COMMENT • link updated 7.5 years ago by GenoMax 141k • written 7.5 years ago by khalid.belkhir ▴ 40

1

Entering edit mode

7.5 years ago

abhishek.abhishekkumar ▴ 20

It is simple in awk:

awk '{print $1, $2, $3, $4, $6, $7, $8, $10, $11, $12}' input_file>output_file

Answer is valid only for the data provided initially like

NC_007622|123-456 NC_234 123 568 NC_007622|123-456 NC_546 126 563 NC_007622|123-456 NC_564 582 369

than output will be:-->

NC_007622|123-456 NC_234 123 568 NC_546 126 563 NC_564 582 369

ADD COMMENT • link 7.5 years ago by abhishek.abhishekkumar ▴ 20

2

Entering edit mode

Based on the posts of other people here I have the impression you are oversimplifying things and your code won't yield the desired result.

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

0

Entering edit mode

7.5 years ago

5heikki 11k

Something like this. Perhaps your field separator is something other than space though? Also the columns after the else..

awk 'BEGIN{FS=" "}{if(NR==1 || !(NR%7)){print $1}else{print $2,$3,$4}}' file.txt

ADD COMMENT • link 7.5 years ago by 5heikki 11k

0

Entering edit mode

For future ref: This command currently produces following output using example in original post.

NC_007622|123-456
NC_546 126 563
NC_564 582 369
NC_985 548 367
NC_758 877 687
NC_234 456 785
NC_007622|841-898

ADD REPLY • link 7.5 years ago by GenoMax 141k

score 2 · Accepted Answer · 2016-11-03

2

Entering edit mode

7.5 years ago

nterhoeven ▴ 120

I would use the following perl one-liner for this:

perl -ane 'BEGIN{$id="";} if($F[0] ne $id){$id=shift(@F); print $id,"\n",join(" ",@F),"\n";}else{shift(@F); print join(" ",@F),"\n";}' filename.txt

Explanation:

The file is read line-wise and each line is split at whitespace
The first column is checked (is it the same than before?)
if yes, the 2nd, 3rd and 4th columns are printed
if no, the 1st column is printed and stored, then the rest is printed in a new line

ADD COMMENT • link 7.5 years ago by nterhoeven ▴ 120

1

Entering edit mode

a little bit simpler:

perl -lane '$h1 = shift @F; $h1 ne $h2 and print $h1; print "@F"; $h2 = $h1' filename.txt

ADD REPLY • link 7.5 years ago by Jorge Amigo 14k

0

Entering edit mode

even simpler:

perl -ape 's/ /\n/; $h and s/\Q$h\E\n//; $h = $F[0]' filename.txt

just learnt that \Q and \E can be used to tell regex to treat a variable as a literal string (the | present in the titles is a regex special character). very convenient if you don' t want to parse your variables when using them inside regex functions.