Question

any script that can do this task

0

Entering edit mode

2.1 years ago

Confused_human ▴ 20

Hi everyone.

I have a Protein dataset like this

Column 1 has protein IDs, column 2 has domain

ABCD_peg_0001 wzz
ABCD_peg_0002 no domain
ABCD_peg_0003 wza
ABCD_peg_0004 no domain
PQRS_peg_0012 no domain
PQRS_peg_0013 wca
PQRS_peg_0014 wzc
PQRS_peg_0015 no domain

At the beginning it has organism names and then peg number that is sequential in order for each organism and then 2nd column is domain type

I want the output to be like

ABCD_peg_0001 wzz
ABCD_peg_0002 no domain
ABCD_peg_0003 wza
---
PQRS_peg_0013 wca
PQRS_peg_0014 wzc

It means it will print everything in between two known domains and remove remaining.

But only if that is falling under the range of +/-10 .

If something beyond that range it won't print it as a cluster. And there will be a separation line in between each combination.

And if a domain is alone and there is no known domain in that +/-10 range it will print that one domain only.

Suppose somewhere it's like

XYZ_peg_0060 no domain
XYZ_peg_0061 no domain
XYZ_peg_0062 wzz
XYZ_peg_0063 no domain

Output will be

XYZ_peg_0062 wzz

and some times the protein IDs can be like

WXY_123_peg0012 wxa
WXY_123_peg0013 no domain
WXY_123_peg0014 wzz

A digit before peg numbers will be there in few cases.

I have tried shell scripting grep -A 10 -B 10, but it did not work. Please suggest me

Thank you

shell-script • 600 views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 2.1 years ago by Confused_human ▴ 20

5

Entering edit mode

please, stop asking the same question again and again, don't delete your posts but edit+comment them. linux shell script which can do this task Linux shell script which can do this task Cluster of neighboring genes by index (Looking for linux shell Script) ....

ADD REPLY • link 2.1 years ago by Pierre Lindenbaum 161k

5

Entering edit mode

Hi, I guess it would be polite and fair to note that you have already a (near) perfect solution. Which should be easily modifiable into what you want. Adding a line between different organisms should be trivial, and to be frank, you should at least put in that little effort and make an attempt first. Honestly, as I now notice that you might mistake biostars for a site for free homework help or free freelance programming services, if you need more and a custom made script you may contact me for support at my normal rate of 100Eur/h. Cheers.

ADD REPLY • link 2.1 years ago by Michael 54k