Entering edit mode
6.6 years ago
Jason
▴
10
Use shell command or python
Suppose I have two files. The first file has more than 100 list of fasta file. The second file has list of motifs. I want to extract the postion of each string in the motifs and save that in txt file 3.
Forexample:
File 1:
>sp|P26140|3BHS2_MOUSE 3 beta-hydroxysteroid dehydrogenase/Delta 5-->4-isomerase type 2 OS=Mus musculus GN=Hsd3b2 PE=1 SV=4
MPGWSCLVTGAGGFLGQRIIQLLVQEEDLEEIRVLDKVFRPETRKEFFNLETSIKVTVLE
GDILDTQYLRRACQGISVVIHTAAIIDVTGVIPRQTILDVNLKGTQNLLEACIQASVPAF
IFSSSVDVAGPNSYKEIVLNGHEEECHESTWSDPYPYSKKMAEKAVLAANGSMLKNGGTL
QTCALRPMCIYGERSPLISNIIIMALKHKGILRSFGKFNTANPVYVGNVAWAHILAARGL
RDPKKSPNIQGEFYYISDDTPHQSFDDISYTLSKEWGFCLDSSWSLPVPLLYWLAFLLET
VSFLLSPIYRYIPPFNRHLVTLSGSTFTFSYKKAQRDLGYEPLVSWEEAKQKTSEWIGTL
VEQHRETLDTKSQ
>sp|P35730|ODBB_RAT 2-oxoisovalerate dehydrogenase subunit beta, mitochondrial OS=Rattus norvegicus GN=Bckdhb PE=1 SV=3
MAAVAARAGGLLRLGAAGAERRRRGLRCAALVQGFLQPAVDDASQKRRVAHFTFQPDPES
LQYGQTQKMNLFQSITSALDNSLAKDPTAVIFGEDVAFGGVFRCTVGLRDKYGKDRVFNT
PLCEQGIVGFGIGIAVTGATAIAEIQFADYIFPAFDQIVNEAAKYRYRSGDLFNCGSLTI
RAPWGCVGHGALYHSQSPEAFFAHCPGIKVVIPRSPFQAKGLLLSCIEDKNPCIFFEPKI
LYRAAVEQVPVEPYKIPLSQAEVIQEGSDVTLVAWGTQVHVIREVASMAQEKLGVSCEVI
DLRTIVPWDVDTVCKSVIKTGRLLISHEAPLTGGFASEISSTVQEECFLNLEAPISRVCG
YDTPFPHIFEPFYIPDKWKCYDALRKMINY
File 2:
Motif
P26140 MPGWSC
P35730 AERRRRGLRCAAL
File 3
Result:
P26140 1,2,3,4,5
P35730 19,20,21,22,23,24,25,27,28,29,30,31
If this is not an assignment then you can use
fuzzpro
from EMBOSS. Second example from above.I want to read many patterns with list of sequences. This software will only read one pattern for each run.
thank you
If you want python, look into
regex.findall()
This may be a good time to learn regex, my friend. If this is for an assignment then I think you will learn the most that way: https://regexr.com/