Question

Amino acid position specific mutation

0

Entering edit mode

5.6 years ago

skjobs ▴ 190

Dear All,

I have a bunch of position specific mutation file and sequences file both. I want to mutate that sequences with given position as examples

Mutation Position

G10M (at position of 10 mutated G to M) Similarly
Y70K ( at position of 70 residue mutated Y to K

>P00519
MLEICLKLVGCKSKKGLSSSSSCYLEEALQRPVASDFEPQGLSEAARWNSKENLLAGPSE
NDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVN
SLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTAS

Output

> P00519
MLEICLKLV`M`CKSKKGLSSSSSCYLEEALQRPVASDFEPQGLSEAARWNSKENLLAGPSE
NDPNLFVAL`K`DFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVN
SLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTAS

if any one have such code please send me.. thanks

software error sequence gene snp • 3.4k views

ADD COMMENT • link updated 5.6 years ago by Joe 21k • written 5.6 years ago by skjobs ▴ 190

3

Entering edit mode

This post is NOT A TUTORIAL. I have changed it AGAIN to a question.

It also is not a "software error". Please use logical tags. In this case you want to mutate positions in an amino acid fasta.

Please show us what you tried to solve this and didn't work. We rather point you in the right direction if you show some effort yourself.

ADD REPLY • link 5.6 years ago by WouterDeCoster 47k

1

Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

Please show us what you tried to solve this and didn't work. We rather point you in the right direction if you show some effort yourself.

In addition, you selected "tutorial", while this is clearly a question. I changed the post type.

Finally, you use the "sequence" and "software error" tag for this question, while these are not accurate. Please use appropriate tags. As such experts can easily find your question and help you.

ADD REPLY • link 5.6 years ago by WouterDeCoster 47k

1

Entering edit mode

Do you have a mutation mapfile with sequence names? I assume you want to do this with more than one sequence at a time?

ADD REPLY • link 5.6 years ago by Joe 21k

0

Entering edit mode

Yes, I have multiple files and want to do with mutation multiple times in a single file.

ADD REPLY • link 5.6 years ago by skjobs ▴ 190

1

Entering edit mode

Ok, to properly automate this you will really need a second input file which maps, to each sequence name (ideally), the positions and substitutions to be made.

Can you show us an example of such a file? A simple tab/comma separated file would do, something like:

>P00519,G10M
>P00519,Y70K
>PXXXX,X123Y
....

ADD REPLY • link 5.6 years ago by Joe 21k

0

Entering edit mode

Yes... Similar file I have which u written..

ADD REPLY • link 5.6 years ago by skjobs ▴ 190

2

Entering edit mode

Yes... Similar file I have which u written..

You really need to put more effort in your question. We cannot read what's on your screen or guess how your input files look like. We are a group of volunteers and we are helping you on free Sunday. Show some respect and don't make this too hard for us.

ADD REPLY • link 5.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks for sharing some details - in this case, your solution is pretty trivial. Use any programming language you're familiar with and use this pseudocode as template.

read input sequence to hash or dictionary with sequence ID
read mutation table to has or dictionary
 split by delimiter with sequence ID
for each input sequence:
  apply string modification as string operation

I tend to recommend Perl for these sort of string operations, but Python is widely used now, too.

ADD REPLY • link 5.6 years ago by Carambakaracho ★ 3.2k

0

Entering edit mode

So, show us it......

ADD REPLY • link 5.6 years ago by Joe 21k

score 3 · Accepted Answer · 2018-09-12

Try the following script. It's not very well tested but I've written it to catch some of the more obvious errors.

Caveats:

You must have a mutation map file which looks like this:
```
>SeqID1,A123B
>SeqID2,X234Y
....
>SeqIDn,YNNNZ
```
Anything else, and it'll break. You should be able to lose the > without it mattering, but this brings me on to caveat number 2
The IDs in your mapfile and your sequences must be identical. i.e. if you have a mutation that reads SeqID1,A123B the corresponding sequence must be called SeqID1. There are ways around this, but this will suffice for the time being I think, so test it and see how you get on.
Your mutation file should be put together with 1-based indexing (the script does the necessary conversion). i.e., the first base/residue should be number 1, not 0 (which is what python understands).

Some other disclaimers, I've only tested this with one mutation per sequence so far. It should handle multiple, but the terminal display may freak out, so you might need to ignore that. You'll need BioPython installed.

Invoke it like so:

$  python Mutate.py -v -o /path/to/output.fasta mutation_map_file.csv input.fasta

https://github.com/jrjhealey/bioinfo-tools/blob/master/Mutate.py