I use this shell script (I'm not the author) to reverse complement DNA:
alias revcomp="echo 'import sys; print \"\".join([dict(zip(\"ACGTacgt\",\"TGCAtgca\"))[c] for c in sys.argv[1][::-1]])' | python -"
In the terminal it works perfect just to write revcomp ATGC
which outputs GCAT.
And to just complement, I changed it to
TGCAtgc\",\"TGCAtgc.
I'm working with some files, where for reasons I'm not in charge for, some bases written as a word. So let's say that one base is called (askJoeaboutthisone)
, how would I complement this sequence? I.e. so that AT(askJoeaboutthisone)GC returns CG(askJoeaboutthisone)TA.
In my script each letter is converted, but I wish that (askJoeaboutthisone) is treated as one letter.
Thanks!
What you're really doing is
transliterating
(and then reversing the sequence). The problem is, you have to iterate the string character by character.Is it possible for your example to use mixed case? i.e. your long strings are all lower case, but your normal bases are all upper? It'll be tricky to delineate whats a string from within a string. Are the parenthesis actually part of your 'data' or is that just for the benefit of the post/explanation?