Entering edit mode
5.3 years ago
Peter
▴
20
Hello,
I have two files:
sequences.fastaintervals.txt
I used the "seqinR" package to obtain the protein sequences of interest (~ 3500 proteins sequences) and saved them to "sequences.fasta":
> sp | O94827 | PKHG5_HUMAN
MDDQSPAEKKGLRCQNPACMDKGRAAKVCHHADCQQLHRRGPLNLCEACDSKFHSTMHYDGHVRFDLPPQG
SVLARNVSTRSCPPRTSPAVDLEEEEEESSVDGKGDRKSTGLKLSKKKARRRHTDDPSKECFTLKFDLNVDIETEIVPAMKKKSLGEVLLP
VFERKGIALGKVDIYLDQSNTPLSLTFEAYRFGGHYLRVKAPAKPGDEGKVEQGMKDSKSLSLPILRPAGTGPPAL
ERVDAQSRRESLDILAPGRRRKNMSEFLGEASIPGQEPPTPSSCSLPSGSSGSTNTGDSWKNRAASRFSGFFSS
GPSTSAFGREVDKMEQLEGKLHTYSLFGLPRLPRGLRF
> sp | P10515 | ODP2_HUMAN
MWRVCARRAQNVAPWAGLEARWTALQEVPGTPRVTSRSGPAPARRNSVTTGYGGVRALCGWTPSSGATPRNRLLLQLL
GSPGRRYYSLPPHQKVPLPSLSPTMQAGTIARWEKKEGDKINEGDLIAEVETDKATVGFESLEECYMAKILVAEGTRDVPIGA
IICITVGKPEDIEAFKNYTLDSSAAPTPQAAPAPTPAATASPPTPSAQ
My "intervals.txt" file has the protein ID, the starting and ending position of my peptide of interest:
ID start end
O94827 1 69
P10515 2 120
I would like to know if there is any way to obtain in a third .txt file the protein ID and the sequence that corresponds to that interval? I would appreciate it if someone helped me!
Thanks in advance!
Please use the formatting bar (especially the
codeoption) to present your post better. You can use backticks for inline code (`text` becomestext), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.