I have a problem, I need to parse the following dataframe:
cluster_name qseqid sseqid pident_x qstart qend sstar send
2 1 seq1_0035_0035 seq13_0042_0035 0.73 42 133 46 189
3 1 seq1_0035_0035 seq13_0042_0035 0.73 146 283 287 389
4 1 seq1_0035_0035 seq13_0042_0035 0.73 301 478 402 503
5 1 seq13_0042_0035 seq1_0035_0035 0.73 46 189 42 133
6 1 seq13_0042_0035 seq1_0035_0035 0.73 287 389 146 283
7 1 seq13_0042_0035 seq1_0035_0035 0.73 402 503 301 478
8 2 seq4_0042_0035 seq2_0035_0035 0.71 256 789 125 678
9 2 seq4_0042_0035 seq2_0035_0035 0.71 802 1056 706 985
10 2 seq4_0042_0035 seq7_0035_0042 0.83 123 745 156 723
12 4 seq11_0035_0035 seq14_0042_0035 0.89 145 647 236 921
13 4 seq11_0035_0035 seq17_0042_0042 0.97 148 623 241 1002
14 5 seq17_0035_0042 seq17_0042_0042 0.94 188 643 179 746
and only get within each cluster the maximum pident_x but the issue is that as you can see I can have reversed sequences (if you take a look at the 2,3,4 and 5,6,7 they are the same but reversed) and what I need to do is to keep only one for exemple only the line 2,3 and 4.
The output would be then :
cluster_name qseqid sseqid pident_x qstart qend sstar send
2 1 seq1_0035_0035 seq13_0042_0035 0.73 42 133 46 189
3 1 seq1_0035_0035 seq13_0042_0035 0.73 146 283 287 389
4 1 seq1_0035_0035 seq13_0042_0035 0.73 301 478 402 503
10 2 seq4_0042_0035 seq7_0035_0042 0.83 123 745 156 723
13 4 seq11_0035_0035 seq17_0042_0042 0.97 148 623 241 1002
14 5 seq17_0035_0042 seq17_0042_0042 0.94 188 643 179 746
Indeed :
for the cluster1:
seq1_0035_0035 vs seq13_0042_0035
has his reversed seq13_0042_0035 seq1_0035_0035
but I only keep the first one.
for the cluster2:
seq4_0042_0035 vs seq7_0035_0042 (0.83)
has a better pident score than seq4_0042_0035 vs seq2_0035_0035 (0.71)
for the cluster4:
seq11_0035_0035 vs seq17_0042_0042 (0.97)
has a better pident score than seq11_0035_0035 vs seq14_0042_0035 (0.89)
for the custer5:
There is only one paired sequence seq17_0035_0042 vs seq17_0042_0042
(0.94) , then I keep this one
I do not really know how to manage to do such a thing, someone has an idea?