Entering edit mode
3.9 years ago
MaheJaan
•
0
I have about 5000 .fasta files which contain 2 versions (transcript) of the same gene split into their domains, like so:
>zf-C4_1_ENST00000512784
RLCLVCGDIASGYHYGVASCEACKAFFKRTIQGNIEYSCPATNECEITKRRRKSCQACRF
MKCLKVGMLK
>Hormone_recep_1_ENST00000512784
IKALTTLCDLADRELVVIIGWAKHIPGFSSLSLGDQMSLLQSAWMEILILGIVYRSLPYD
DKLVYAEDYIMDEEHSRLAGLLELYRAILQLVRRYKKLKVEKEEFVTLKALALANSDSMY
IEDLEAVQKLQDLLHEALQDYELSQRHEEPWRTGKLLLTLPLLRQTAAKAVQHFYSVKLQ
>zf-C4_1_ENST00000644823
RLCLVCGDIASGYHYGVASCEACKAFFKRTIQGNIEYSCPATNECEITKRRRKSCQACRF
MKCLKVGMLK
>Hormone_recep_1_ENST00000644823
IKALTTLCDLADRELVVIIGWAKHIPGFSSLSLGDQMSLLQSAWMEILILGIVYRSLPYD
DKLVYAEDYIMDEEHSRLAGLLELYRAILQLVRRYKKLKVEKEEFVTLKALALANSDSMY
IEDLEAVQKLQDLLHEALQDYELSQRHEEPWRTGKLLLTLPLLRQTAAKAVQHFYSVKLQ
I will like to run a pairwise comparison on the same domains.
When I try to run this with msa(mySequences) it is treated as 4 sequences to compare, not just the domains. Any help on how I can so this, and maybe make a loop in R for the other 5000 files?