Extract out lines with 's__' into a new file together with first two lines of the file
1
0
Entering edit mode
3.7 years ago
dpc ▴ 240

Hi community!!! I have a metaphlan output file from where I want to extract the first two lines (starting with "ID" and "#SampleID") along with all the lines that contain "s__" in that line. And, want to put them in a new file. Can anyone please tell me how can I do that? I have used grep 's__' > extracted_file.txt. But it does not keep the first two lines in the output. Here is an example of the output:

command grep • 1.3k views
ADD COMMENT
0
Entering edit mode

you can try awk 'NR < 3 || /s__/{print}' stackoverflow.tsv or sed -n '1,2p;/s__/p' stackoverflow.csv

ADD REPLY
0
Entering edit mode
3.7 years ago

Hi,

Try the following:

grep "#SampleID\|ID\|s__" <target.file> > extracted_file.txt

Grep allows multiple matches. In this case you are saying match #SampleID or (sign - \|) ID or s__.

I hope this helps,

António

ADD COMMENT
1
Entering edit mode

i'd add to that and say grep "^#SampleID\|^ID\|s__" <target.file> > extracted_file.txt to make sure all matches are at the beginning of the line. you can also use option -P so you don't have to escape the | operator

ADD REPLY
1
Entering edit mode

Thank you @from the mountains!

That makes sense, though for the s__ match I believe that @dpc wants to identify that pattern in the whole line, not necessarily at the beginning.

I did not know that -P option. Thanks for pointing out.

António

ADD REPLY
0
Entering edit mode

i've edited my comment for clarity.

ADD REPLY
0
Entering edit mode

Thanks... but at this moment I want match in any place of the line... should I also add -P to Antonio's command?

ADD REPLY
1
Entering edit mode

Hi @dcp,

You don't need to add the -P option. The -P option is to deal with escape, according to @from the mountains! (I never tried it myself)

So, the idea is the following: since you want to match several words, i.e., match #SampleID or ID or s__, you need to add the special character | that means or, matching #SampleID|ID|s__, but as this is a special character (used as pipe in shell/bash), the shell deal with it in a different way, therefore you need to escape it, i.e., add the sign \ before | - \|.

What @from the mountains suggested (I did not know that) is that you don't need to escape if you use the -P option before.

So, the command-line should work well without the -P option.

If that solves your problem, please upvote the answer.

I hope this answers your question,

António

ADD REPLY
0
Entering edit mode

Thanks Antonio. It was really helpful.

ADD REPLY

Login before adding your answer.

Traffic: 2404 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6