Question: how to remove rows based on certain characters
0
gravatar for zqwu
5 weeks ago by
zqwu0
zqwu0 wrote:

Dear all ,

I have a file over 30000 rows (\t as the space), I want to remove some based on certain characters:

for example:

Name    Len Name2   Order

KCNQ2_32937 2535    KCNQ2   32937

KCNQ2_32938 2733    KCNQ2   32938

KCNQ2_32939 2616    KCNQ2   32939

KCNQ2_32940 2544    KCNQ2   32940

KCNQ2_32941 1809    KCNQ2   32941

.
.
.

the filter is like this:

In Name2 column, if the name of each cell is the same, I want keep the largest one in Len column:

Name    Len Name2   Order

KCNQ2_32938 2733    KCNQ2   32938

...

How can I do it like this?

TJ

R • 122 views
ADD COMMENTlink modified 5 weeks ago by 5heikki6.3k • written 5 weeks ago by zqwu0
7
gravatar for Pierre Lindenbaum
5 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum92k wrote:

sort column 3 and then column 2 (reverse number) , followed by a stable sort/uniq on column 3

sort -t $'\t ' -k3,3 -k2,2rn input.tsv | sort -t $'\t ' -k3,3 -u --stable
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Pierre Lindenbaum92k

thanks. It is fast and exactly what I need.

ADD REPLYlink written 5 weeks ago by zqwu0

If this answer solved your problem then go ahead and "accept" (green check mark). @5heikki's answer which appears to have been written almost at the same time may also be fine and can be accepted in addition to @Pierre's.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax226k
4
gravatar for 5heikki
5 weeks ago by
5heikki6.3k
Finland
5heikki6.3k wrote:
sort -t $'\t' -k3,3 -k2,2gr file | sort -t $'\t' -u -k3,3

Also: man sort

ADD COMMENTlink written 5 weeks ago by 5heikki6.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1466 users visited in the last hour