grep command or to select specific protein ID with sequences from a big file...
0
0
Entering edit mode
7 weeks ago

Hi All.

From this file, I want to select some proteins ID, with amino acid sequences. The question is how to do it in Linux to use grep command or any other suggestion, please.

>CALMACT00000025247|CALMACG00000015281|Seq_id=6096_quiver|type=cds
HGHPCLLNAPYLNCRSKNRGTQNFELPHNGLLTLFLDDCHALYLL*
>CALMACT00000001382|CALMACG00000000849|Name=efcab1|Seq_id=308_quiver|type=cds
MANKPEKPTENTMEEKEEKEKKESFGLHLDLLMDSMEETRFKKKYSELIEKLVHETHFND
VEVESLLIIYYKLAKENMKTEKYKEKSVTKEQFRDFLHCALDMTDDTLMDRIFVSLDRMP
GSTITMETYAHALSLMLRGTLEEKINFCFRIGIF*
>CALMACT00000001383|CALMACG00000000849|Name=efcab1|Seq_id=308_quiver|type=cds
MGDGLLGRDAMFYLLRNSLVSLSGEDDAEESVKDMIEVITKKLDVDRDGKISFQDYKQTV
LKQPALLEVFGQCLPSRGAVYTFSTTFSSNPNLKM*
>CALMACT00000001384|CALMACG00000000849|Name=efcab1|Seq_id=308_quiver|type=cds
MANKPEKPTENTMEEKEEKEKKESFGLHLDLLMDSMEETRFKKKYSELIEKLVHETHFND
VEVESLLIIYYKLAKENMKTEKYKEKSVTKEQFRDFLHCALDMTDDTLMDRIFVSLDRMP
GSTITMETYAHALSLMLRGTLEEKINFCFRVNNAQIRVYDTMGDGLLGRDAMFYLLRNSL
VSLSGEDDAEESVKDMIEVITKKLDVDRDGKISFQDYKQTVLKQPALLEVFGQCLPSRGA
VYTFSTTFSSNPNLKM*
>CALMACT00000001385|CALMACG00000000849|Name=efcab1|Seq_id=308_quiver|type=cds
MENASVLFKLESVVYDTMGDGLLGRDAMFYLLRNSLVSLSGEDDAEESVKDMIEVITKKL
DVDRDGKISFQDYKQTVLKQPALLEVFGQCLPSRGAVYTFSTTFSSNPNLKM*
>CALMACT00000016141|CALMACG00000009873|Name=mns1|Seq_id=308_quiver|type=cds
MAERYTESEFEEIPEATGQHELDHGLLIEAIQGYPYLYDMTNQLYKNVKKKAEAWEIIAG
LLDSTVQDCMKAWKSLRDRYVKEKNRCASGSEAPESIWRYFDSMHFYAKFTKPRKTHTRP
MKTTQSGKSSESSRPSSTMSMWSPVDEVVCEDETLTNTPTRSEEGPGPSRRKRRLSLDTP
ISSGKSAKRVENSFLEVAKEIIDKIDKKQHQMNPNKTFCDYLFTELEKLPEAEAKEKRRQ
ILLYLLEKLAEDAKLDQLSEQKRRMKMLQLRRDTERMMIERRQKHAEEMQLLMKIEDEAQ
AELEIKRKIVEEERHRMLGEHVKNLIGYLPKGILNVDDLPHLDKEVVEKVSPGYTKN*
>CALMACT00000007948|CALMACG00000004977|Seq_id=308_quiver|type=cds
MKNLRMEGKEIHQTGKGITQKGKLKKAKTMKDVECKCHYNCNNSISKEARHK
>CALMACT00000025994|CALMACG00000015721|Seq_id=308_quiver|type=cds
MLWAGDGTRELSCFNYRGVDRLTPPNHCVFVLSASVLFTWS*
>CALMACT00000003155|CALMACG00000002001|Seq_id=308_quiver|type=cds
PTQKTQTSLFTKHKLKFPLNTEEDISKFEDVLKSEEEFNAACDELARFGGSNIYNFIKRT
LTALLSNEQAKNYSLKGRKGKKPFEKLLLARLLICAAEKSLNTTQKAVEDAICSWLKRAP
ERLQGYRKKFP*
>CALMACT00000032398|CALMACG00000019612|Seq_id=308_quiver|type=cds
MHSPYSVDTQNNQPREENSTLRTSIYCSSLIHERHLFLITATAENPASYKYDVANEIKIL
EAFEVRFGWILNR*
>CALMACT00000035072|CALMACG00000021205|Name=r3hcc1l|Seq_id=428_quiver|type=cds
MAHHLMMIMFAAQKAVSLTSPIIKARPMTAASRATLNTAQRHDLKPAMKRPQTNLQTARR
LITTHLGKKSRISQEQSAQERNDLRTAREQKRLVKKNEQDAWEGNLRSSLN*
>CALMACT00000006722|CALMACG00000004219|Seq_id=1904_quiver|type=cds
MRAKKGRNFASVQSPAHQMSEDIKTITEYALKSKTLEEIDIDIASYNLKPCCANVVREIM
DLTAFDNAVLSAQYSDIWKQERQFRITGTRCYSVYTFAKDNWSTMTRNFFWPKPFTSRYT
DHGIKYEKEALIKYTRSNNYKVVELGLVICKQLPWIAYSPDGVVMADGAPTRLVEIKCPY
DGILPADNLKVLT*
>CALMACT00000006723|CALMACG00000004219|Seq_id=1904_quiver|type=cds
MSDKSVFLTLSSVFKYLCANTDSRCITEGEEILNANHIILMGVVSDSEEYVELKGRNFAS
VQSPAHQMSEDIKTITEYALKSKTLEEIDIDIASYNLKPCCANVVREIMDLTAFDNAVLS
AQYSDIWKQERQFRITGTRCYSVYTFAKDNWSTMTRNFFWPKPFTSRYTDHGIKYEKEAL
IKYTRSNNYKVVELGLVICKQLPWIAYSPDGVVMADGAPTRLVEIKCPYDGILPADNLKV
LT*
>CALMACT00000006724|CALMACG00000004219|Seq_id=1904_quiver|type=cds
MSDKSVFLTLSSVFKYLCANTDSRCITEGEEILNANHIILMGVVSDSEEYVELSSIASLE
ILSSTDVKCYWSYKKKAVEEQ*
>CALMACT00000029885|CALMACG00000018110|Seq_id=1904_quiver|type=cds
MGKPVTASMCVCSKHFRKEDYCAKDIQMNRPKLKRFAVPSLNLPKRKVYLEEHRHSSLGR
EERYAKRQKVQELQDANIVDAPLCDGSEEAVQLTVVPTPYGDLTEEEKVAVQNLLLLSSR
VAKGLVDKEVQVSSGDIIITFSSTIKEDRHLNSLTGIPSFTLLNKLALLVKKNYPDIKKH
KLSIIDRIVLVFMKLKMGLKFNVLSFLFKICSASCKIIFVEYVGKLANILKSCIVWPSYE
ECQQNIPSCFIDFKSVRTVLDCIEIPIEKPKCLCCRIRTYSHYKGRQTLKIMTGVSPAGL
ITFLSKSYGGRTSDKAIFEQSHLINKLQSRQDSLMVDKGFLIDKICKDHFIKLVRPHFLS
KKKQFSAEESKENISISRARVHIERVNQRMRIFNILNNPLPNALIPCIDNILVIICGMVN
LQTPILSAGKF*
Bioinfomatics • 372 views
ADD COMMENT
1
Entering edit mode

what did you try ? what did you find on this forum ?

ADD REPLY
0
Entering edit mode

very useful site for me

ADD REPLY
1
Entering edit mode

seqkit is great for fasta files. In particular the seqkit grep subcommand can be used to search either fasta headers or sequences.

ADD REPLY

Login before adding your answer.

Traffic: 1729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6