Hi everyone.
I have a Protein dataset like this
Column 1 has protein IDs, column 2 has domain
ABCD_peg_0001 wzz
ABCD_peg_0002 no domain
ABCD_peg_0003 wza
ABCD_peg_0004 no domain
PQRS_peg_0012 no domain
PQRS_peg_0013 wca
PQRS_peg_0014 wzc
PQRS_peg_0015 no domain
At the beginning it has organism names and then peg number that is sequential in order for each organism and then 2nd column is domain type
I want the output to be like
ABCD_peg_0001 wzz
ABCD_peg_0002 no domain
ABCD_peg_0003 wza
---
PQRS_peg_0013 wca
PQRS_peg_0014 wzc
It means it will print everything in between two known domains and remove remaining.
But only if that is falling under the range of +/-10 .
If something beyond that range it won't print it as a cluster. And there will be a separation line in between each combination.
And if a domain is alone and there is no known domain in that +/-10 range it will print that one domain only.
Suppose somewhere it's like
XYZ_peg_0060 no domain
XYZ_peg_0061 no domain
XYZ_peg_0062 wzz
XYZ_peg_0063 no domain
Output will be
XYZ_peg_0062 wzz
and some times the protein IDs can be like
WXY_123_peg0012 wxa
WXY_123_peg0013 no domain
WXY_123_peg0014 wzz
A digit before peg numbers will be there in few cases.
I have tried shell scripting grep -A 10 -B 10
, but it did not work. Please suggest me
Thank you
please, stop asking the same question again and again, don't delete your posts but edit+comment them. linux shell script which can do this task Linux shell script which can do this task Cluster of neighboring genes by index (Looking for linux shell Script) ....
Hi, I guess it would be polite and fair to note that you have already a (near) perfect solution. Which should be easily modifiable into what you want. Adding a line between different organisms should be trivial, and to be frank, you should at least put in that little effort and make an attempt first. Honestly, as I now notice that you might mistake biostars for a site for free homework help or free freelance programming services, if you need more and a custom made script you may contact me for support at my normal rate of 100Eur/h. Cheers.