Off topic:Need help on sorting on two columns
0
0
Entering edit mode
8.2 years ago
jon.brate ▴ 290

I linearized a fasta file and counted the lengths of each sequence. Each line now consist of three tab separated columns, but column one also has a space in it.

>TCONS_00000098 gene=XLOC_000037    TGTGAACTGTTTGGAATGCCTAGATCATGATGAAGATTTTGGCGGCAAATCACGAACTACCAGATG    66
>TCONS_00000097 gene=XLOC_000037    TGTGAACTGTTTGGAATGCCTAGATCATGATGAAGATTTTGGCGGCAAATCACGAACTACCAGGTTGTGT    70
>TCONS_00000099 gene=XLOC_000037    TGAAGATTTTGGCGGCAA    18
>TCONS_00000100 gene=XLOC_000037    CAGATCGTCAAAAGTTTTTGAAGTTCCTCAAAAGAT    36
>TCONS_00000052 gene=XLOC_000022    AGCATTCG    8
>TCONS_00000025 gene=XLOC_000008    ACCGGTTTGCGTACTGATTTGCGTACTGGTTCGTGTA    37
>TCONS_00000132 gene=XLOC_000046    GTTTTAGTTGTTAGGTCTAACA    22
>TCONS_00000133 gene=XLOC_000046    CTGAGCAGTAACGCGACGCAGATCACTAAAGATCTG    36

I want to extract the longest isoform (TCONS...) of each gene, and I tried to sort the lines first on column 1, and then according to the lengths with the longest on top. I thought this command would work:

cat lengths.txt | sort -t '    ' -k1,1 -k3,3nr > sorted.txt

and it seems to somehow sort TCONS_00000097 right, but that is probably because of its name, not the length.

Output:

>TCONS_00000025 gene=XLOC_000008    ACCGGTTTGCGTACTGATTTGCGTACTGGTTCGTGTA    37
>TCONS_00000052 gene=XLOC_000022    AGCATTCG    8
>TCONS_00000097 gene=XLOC_000037    TGTGAACTGTTTGGAATGCCTAGATCATGATGAAGATTTTGGCGGCAAATCACGAACTACCAGGTTGTGT    70
>TCONS_00000098 gene=XLOC_000037    TGTGAACTGTTTGGAATGCCTAGATCATGATGAAGATTTTGGCGGCAAATCACGAACTACCAGATG    66
>TCONS_00000099 gene=XLOC_000037    TGAAGATTTTGGCGGCAA    18
>TCONS_00000100 gene=XLOC_000037    CAGATCGTCAAAAGTTTTTGAAGTTCCTCAAAAGAT    36
>TCONS_00000132 gene=XLOC_000046    GTTTTAGTTGTTAGGTCTAACA    22
>TCONS_00000133 gene=XLOC_000046    CTGAGCAGTAACGCGACGCAGATCACTAAAGATCTG    36
sort • 960 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2782 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6