I have a list of start and end position information from protein database, and some sequences are overlapped with each other because of the existence of isoforms. I want to remove overlapped sequences and keep the longest one. How could I achieve this?
Data is like:
KN150702.1  512 66743 
KN150702.1  4526    75660 
KN150702.1  51685   52551 
KN150702.1  75503   111816 
KN150702.1  126256  146772 
KN150702.1  155049  175903 
KN150702.1  177161  211884 
KN150703.1  4605    14526 
KN150703.1  16536   18921 
KN150703.1  16536   18879 
KN150703.1  23158   47525 
KN150703.1  36969   40261 
KN150703.1  42415   46815
And the results should be:
KN150702.1  4526    75660
KN150702.1  126256  146772
KN150702.1  155049  175903
KN150702.1  177161  211884
KN150703.1  4605    14526
KN150703.1  16536   18921
KN150703.1  23158   47525
                    
                
                
Very minor nitpick: the pipe character can finish the line, that is, it does not need to be followed by an escaped newline.
Thanks for your answer, but actually I do not want to merge them. I only need to keep the longest one and remove the others.
I update the answer. Accept the answer if it works so that it won't bump again in future.