I have a large file with 105 protein sequences. To obtain this file I used the 'seqinr' function:
library (seqinr) myfasta <- read.fasta (file = "mydata.fasta", seqtype = "AA", as.string = TRUE, set.attributes = FALSE) subsetlist <-read.table ("mylist.txt", header = TRUE) my_fasta_sub <- myfasta [names (myfasta)% in% subsetlist $ ID] write.fasta (sequences = my_fasta_sub, names = names (my_fasta_sub), nbchar = 80, file.out = "myresylt.fasta")
Next, I would like to look for some sequences of peptides within this list with the 105 sequences, and furthermore, to know which are the two amino acids before the first amino acid of the peptide. Does anyone have any idea how I can do this, please? I'm new to the R environment.
Example of peptide sequences:
DAIPAVEVFEGEPGNK AVFQLLDSMGPSLPIAEYIASLDRPR GFCFITFKEEEPVKK HAFSGGRDTIEEHR
I would like to know what the two amino acids are before, for example:
_ _ DAIPAVEVFEGEPGNK _ _ AVFQLLDSMGPSLPIAEYIASLDRPR _ _ GFCFITFKEEEPVKK _ _ HAFSGGRDTIEEHR
Thanks in advance!