get names from fasta file
1
0
Entering edit mode
7.6 years ago
zzzahiri • 0

How can I get the name of proteins from fasta file in R for example: P13639 from

sp|P13639|EF2_HUMAN Elongation factor 2 OS=Homo sapiens GN=EEF2 PE=1 SV=4 MVNFTVDQIRAIMDKKANIRNMSVIAHVDHGKSTLTDSLVCKAGIIASARAGETRFTDTR KDEQERCITIKSTAISLFYELSENDLNFIKQSKDGAGFLINLIDSPGHVDFSSEVTAALR VTDGALVVVDCVSGVCVQTETVLRQAIAERIKPVLMMNKMDRALLELQLEPEELYQTFQR IVENVNVIISTYGEGESGPMGNIMIDPVLGTVGFGSGLHGWAFTLKQFAEMYVAKFAAKG EGQLGPAERAKKVEDMMKKLWGDRYFDPANGKFSKSATSPEGKKLPRTFCQLILDPIFKV FDAIMNFKKEETAKLIEKLDIKLDSEDKDKEGKPLLKAVMRRWLPAGDALLQMITIHLPS PVTAQKYRCELLYEGPPDDEAAMGIKSCDPKGPLMMYISKMVPTSDKGRFYAFGRVFSGL VSTGLKVRIMGPNYTPGKKEDLYLKPIQRTILMMGRYVEPIEDVPCGNIVGLVGVDQFLV KTGTITTFEHAHNMRVMKFSVSPVVRVAVEAKNPADLPKLVEGLKRLAKSDPMVQCIIEE SGEHIIAGAGELHLEICLKDLEEDHACIPIKKSDPVVSYRETVSEESNVLCLSKSPNKHN RLYMKARPFPDGLAEDIDKGEVSARQELKQRARYLAEKYEWDVAEARKIWCFGPDGTGPN ILTDITKGVQYLNEIKDSVVAGFQWATKEGALCEENMRGVRFDVHDVTLHADAIHRGGGQ IIPTARRCLYASVLTAQPRLMEPIYLVEIQCPEQVVGGIYGVLNRKRGHVFEESQVAGTP MFVVKAYLPVNESFGFTADLRSNTGGQAFPQCVFDHWQILPGDPFDNSSRPSQVVAETRK RKGLKEGIPALDNFLDKL

R • 3.6k views
ADD COMMENT
0
Entering edit mode
7.6 years ago
Medhat 9.7k

from

http://www.bioconductor.org/packages/2.13/bioc/html/Biostrings.html

you can use

library("Biostrings")
myFastaFile <- readAAStringSet("my.fasta")
seqName = names(myFastaFile)

and if you have a big file you can refer to

http://stackoverflow.com/questions/23173215/how-to-subset-sequences-in-fasta-file-based-on-sequence-id-or-name

ADD COMMENT
0
Entering edit mode

Thanks But the result of this code is sp|P13639|EF2_HUMAN but I want to have P13639 :-(

ADD REPLY
0
Entering edit mode

maybe use gsub?

ADD REPLY
0
Entering edit mode

what about using split?

strsplit(seq, "|")[[1]][2]
ADD REPLY

Login before adding your answer.

Traffic: 2975 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6