How to print coordinates in continuous manner
0
0
Entering edit mode
26 days ago
Lu K • 0

I would like to arrange blast outfmt -7 output file.

Column 1 contains accession ID
Column 2 – subject start
Column 3 – subject end
Column 4 – difference between column 2 and 3

Input file

ptg000001l      1714    4715 -3001
ptg000001l      3669    1932 1737
ptg000001l      4514    3725 789
ptg000001l      4839    5622 -783
ptg000001l      4840    5785 -945
ptg000001l      4840    5894 -1054
ptg000001l      4841    5751 -910
ptg000001l      4841    5785 -944
ptg000001l      4842    5542 -700
ptg000001l      4842    5784 -942
ptg000001l      4843    5409 -566
ptg000001l      4843    5659 -816
ptg000001l      4843    5665 -822
ptg000001l      4843    5776 -933
ptg000001l      4843    5784 -941
ptg000001l      4843    5894 -1051
ptg000001l      4843    6023 -1180
ptg000001l      4843    6333 -1490

I would like to collect only those accession which has same number ($2) and whose $3 is of larger length to assemble coordinates in continuous manner.

output file

ptg000001l      1714    4715 -3001
ptg000001l      4839    5622 -783
ptg000001l      4843    6333 -1490

Thank you Luke

outfmt blast • 220 views
ADD COMMENT
0
Entering edit mode

Could you please explain collect only those accession which has same number ($2) and whose $3 is of larger length to assemble coordinates in continuous manner. ? I was not able to get the requirements.sorry for that.

ADD REPLY
0
Entering edit mode

column 2 ($2) has 4841 4841 4842 4842 4843 4843

since column 3 5751 5785 5542 5784 5409 6333

so for example, I would like to keep/collect only column 2 when column 3 is of larger length i.e 4843 6333

ADD REPLY
0
Entering edit mode

sorry..still didn't get the logic. 5751 (column 3) is bigger number than 4841 (column 2) and 5785 (column3) is bigger number for 4841 (based on column 2 grouping). Your output is supposed to include 4841 and 5785, 4842 and 5784 in addition to 4843 and 6333, based on the description above. In the original OP, record ptg000001l 4839 5622 -783 satisfies your requirement above. But it's not in expected output. Based on the logic described above, output from OP data should be :

ptg000001l  1714    4715    -3001
ptg000001l  4839    5622    -783
ptg000001l  4840    5894    -1054
ptg000001l  4841    5785    -944
ptg000001l  4842    5784    -942
ptg000001l  4843    6333    -1490

not

ptg000001l      1714    4715 -3001
ptg000001l      4839    5622 -783
ptg000001l      4843    6333 -1490

Unless, I didn't get the logic correct.

ADD REPLY
0
Entering edit mode

basically, if the coordinates are overlapped then considered the start from the $2 and end from $3. I would like to assemble the coordinates from $2 and $3 and report if those which are missing.

output

ptg000001l 1714 4715 -3001

ptg000001l 4839 -- -783

ptg000001l -- 6333 -1490

ADD REPLY

Login before adding your answer.

Traffic: 993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6