Question: how to choose the biggest from multiple lines
0
gravatar for wu.zhiqiang.1020
4 weeks ago by
United States
wu.zhiqiang.102020 wrote:

Hi all, I have some gene name data like this. The title as GeneName Isoform Length

Zm00001d000001  T001    438

Zm00001d000001  T002    1842

Zm00001d000001  T005    1842 

Zm00001d000001  T006    1503

Zm00001d000002  T001    5025

Zm00001d000002  T002    5034

Zm00001d000002  T005    4551

Zm00001d000002  T007    3432

I want to choose the longest one from them. as

Zm00001d000002  T002    5034;

But some isoforms have the same length, as

Zm00001d000001  T002    1842

Zm00001d000001  T005    1842

I will choose the one based the second column as the smallest (or randomly choose one)

Zm00001d000001  T002    1842

is there a best way to do this?

thanks

gene • 154 views
ADD COMMENTlink modified 4 weeks ago by genomax73k • written 4 weeks ago by wu.zhiqiang.102020
1

What have you tried? This can be done in a straightforward way with R or python, and in a more complicated way with awk. Please tell us what you've tried and the exact problem you're facing, and we can help you solve it. Without that, this is just asking us to do your work for you.

ADD REPLYlink written 4 weeks ago by RamRS24k

the input is like this:

Zm00001d000001 T001 438

Zm00001d000001 T002 1842

Zm00001d000001 T005 1842

Zm00001d000002 T001 5025

Zm00001d000002 T002 5034

Zm00001d000002 T005 4551

the final result like this:

Zm00001d000001 T002 1842

Zm00001d000002 T002 5034

for each gene, I just want to choose the longest one. this is what I want. I hope I make it clear

ADD REPLYlink modified 4 weeks ago by genomax73k • written 4 weeks ago by wu.zhiqiang.102020

Your requirements were clear. What was not clear was what you'd tried by yourself. That is not a point addressed in your question or your comment. Please be informed that it is good practice to try and solve something by yourself before asking for help.

ADD REPLYlink written 4 weeks ago by RamRS24k

thanks. I am not good at those computing stuff. I am just starting now. thanks

ADD REPLYlink written 4 weeks ago by wu.zhiqiang.102020
5
gravatar for Pierre Lindenbaum
4 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

asuming a tab-delimited file

sort -t $'\t' -k1,1 -k3,3rn  input.tsv |  sort -t $'\t' -k1,1  -u --stable 
Zm00001d000001  T002    1842
Zm00001d000002  T002    5034
ADD COMMENTlink written 4 weeks ago by Pierre Lindenbaum123k

yes, this is exactly what I want. Just save the longest. thanks.

ADD REPLYlink written 4 weeks ago by wu.zhiqiang.102020

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work.

Upvote|Bookmark|Accept

ADD REPLYlink written 4 weeks ago by RamRS24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2254 users visited in the last hour