Question: how to remove rows based on certain characters
0
gravatar for zqwu
11 months ago by
zqwu0
zqwu0 wrote:

Dear all ,

I have a file over 30000 rows (\t as the space), I want to remove some based on certain characters:

for example:

Name    Len Name2   Order

KCNQ2_32937 2535    KCNQ2   32937

KCNQ2_32938 2733    KCNQ2   32938

KCNQ2_32939 2616    KCNQ2   32939

KCNQ2_32940 2544    KCNQ2   32940

KCNQ2_32941 1809    KCNQ2   32941

.
.
.

the filter is like this:

In Name2 column, if the name of each cell is the same, I want keep the largest one in Len column:

Name    Len Name2   Order

KCNQ2_32938 2733    KCNQ2   32938

...

How can I do it like this?

TJ

R • 295 views
ADD COMMENTlink modified 11 months ago by 5heikki7.0k • written 11 months ago by zqwu0
7
gravatar for Pierre Lindenbaum
11 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum104k wrote:

sort column 3 and then column 2 (reverse number) , followed by a stable sort/uniq on column 3

sort -t $'\t ' -k3,3 -k2,2rn input.tsv | sort -t $'\t ' -k3,3 -u --stable
ADD COMMENTlink modified 11 months ago • written 11 months ago by Pierre Lindenbaum104k

thanks. It is fast and exactly what I need.

ADD REPLYlink written 11 months ago by zqwu0

If this answer solved your problem then go ahead and "accept" (green check mark). @5heikki's answer which appears to have been written almost at the same time may also be fine and can be accepted in addition to @Pierre's.

ADD REPLYlink modified 11 months ago • written 11 months ago by GenoMax42k
4
gravatar for 5heikki
11 months ago by
5heikki7.0k
Finland
5heikki7.0k wrote:
sort -t $'\t' -k3,3 -k2,2gr file | sort -t $'\t' -u -k3,3

Also: man sort

ADD COMMENTlink written 11 months ago by 5heikki7.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 665 users visited in the last hour