select rows in a tab-delimited ed file
1
0
Entering edit mode
21 months ago

Hi all, I hope you are doing well. I have a tab-delimited file (about 60000 rows). The first column has information about the transposable element subfamily (eg: SINE, LINE, LTR or ERV). any Idea how can I sort them according to this information? I have attached an example to make it more clear.

I really appreciate any help you can provide.

Surar

ps: the file has three columns, one contains strings and two columns with numbers only.

chr6|42126072|42126376|AluSg:Alu:SINE|79|-,-    75.27258521 -1.598304168
chr6|106863597|106863885|THE1B:ERVL-MaLR:LTR|102|+,-    5.261992063 -4.333613608
chr7|73829665|73829806|L2a:L2:LINE|331|+,+  10.93709267 3.390930038
chr2|60928170|60929829|L1PA2:L1:LINE|18|+,+ 777.2062642 -0.943366885
chr15|42079918|42080207|MER33:hAT-Charlie:DNA|174|-,-   21.23243208 -3.026516149
chr12|14620131|14620527|MER57C2:ERV1:LTR|199|+,+    72.34162087 -1.262584508
chr15|66887807|66888452|LTR8:ERV1:LTR|65|-,-    25.45794673 -4.407397479
chr11|64988838|64988985|MIRb:MIR:SINE|288|+,-   16.14047449 -2.830192579
linux text • 1.5k views
ADD COMMENT
0
Entering edit mode
$ awk -v OFS="\t" -F "[:|]" '{print > $6".txt"}' test.txt
ADD REPLY
0
Entering edit mode

I'm not sure if OP changed their question after you added the comment, but this won't help sort anything.

ADD REPLY
0
Entering edit mode

Couldn't this just be piped into sort?

ADD REPLY
0
Entering edit mode

OP didn't change the post. It was my mistake in understanding. Code I posted will write each element (SINE,DNA etc) into individual files. Following code posted by you (@ Ram) will sort the file by elements.

ADD REPLY
0
Entering edit mode

Not a mistake, really. "Sort" can have two meaning and without examples from OP, understanding their requirement can be inexact.

ADD REPLY
0
Entering edit mode

If you don't mind using R, here's a quick solution. For columns I didn't know what they were, I just named them Unknown 1 and Unknown 2. I did the same for the Transposon column (named the three One, Two, Three, with Three having the information you wanted to sort by.)

library(tidyverse)

read_lines("sample.txt") %>%
    map_dfr(~ str_split(.x, "\\|", simplify = TRUE)[1,] %>%
            as_tibble() %>%
            mutate(column = c("Chromosome", "Start", "End", "Transposon", "Unknown_1", "Unknown_2")) %>%
            pivot_wider(names_from = column, values_from = value)) %>%
    separate(Transposon, sep = ":", into = c("One", "Two",  "Three")) %>%
    arrange(Three)
ADD REPLY
1
Entering edit mode

Please mention that this is R 4.2+ - not everyone is using the latest R with the builtin pipe operator. Or, given that you're using tidyverse, use magrittr's %>% so your code becomes a little more backward compatible.

ADD REPLY
1
Entering edit mode

Native pipe is 4.1 just FYI.

ADD REPLY
0
Entering edit mode

Ah good point - I've gotten so used to the native pipe. Changed it here and will for the other post I just made!

ADD REPLY
0
Entering edit mode

Thank you very much for all the answers and sorry for the late reply. for some reason, I didn't get any notification. using sort -t: -k3,3 test.txt solved my problem.

Best regards Surar

ADD REPLY
2
Entering edit mode
21 months ago
Ram 43k

I don't think so, as it would only have the LTR/SINE text. We would need something like sort -t: -k3,3 test.txt (unless by "sort" OP meant not "to order" but "to classify/partition" into different files)

ADD COMMENT
0
Entering edit mode

Thank you very much, using sort -t: -k3,3 test.txt has solved my problem.

ADD REPLY
0
Entering edit mode

Please accept @Ram's answer to provide closure to this thread (green check mark),

ADD REPLY

Login before adding your answer.

Traffic: 2518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6