How to extract multi-line protein sequences with Ids present in headers ?
1
0
Entering edit mode
4.0 years ago

Hi all Can anyone tell me how to retrieve multi-line protein sequences with Ids present in headers?

>gi|1706522686|gb|QDM68077.1| CraA [Acinetobacter baumannii]
MKNIQTTALNRTTLMFPLALVLFEFAVYIGNDLIQPAMLAITEDFGVSATWAPSSMSFYLLGGASVAWLL
GPLSDRLGRKKVLLSGVLFFALCCFLILLTRQIEHFLTLRFLQGIGLSVISAVGYAAIQENFAERDAIKV
MALM
>gi|1818457412|dbj|BCA98153.1| 1-acyl-sn-glycerol-3-phosphate acyltransferase [Acinetobacter baumannii]
MTQTQSIVNSTLKKFSKIGLYGKKVTSATAAISEGFYLVYRHGLYKDPNNPVNTRYVQYFCRRLCQVFNL
EVQVHGTIPREPALWVSNHISWLDIAVLGSGARVFFLAKAEIEKWPILGNLAKGGGTLFIKRGSGDSIKI

>gi|1818457412|dbj|BCA98158.1| 1-acyl-phosphate acyltransferase [Acinetobacter baumannii]
MTQTQSIVNSTLKKFSKIGLYGKKVTSATAAISEGFYLVYRHGLYKDPNNPVNTRYVQYFCRRLCQVFNL
EVQVHGTIPREPALWVSNHISWLDIAVLGSGARVFFLAKAEIEKWPILGNLAKGGGTLFIKRGSGDSIKI

and I have Ids like

QDM68077.1
BCA98153.1

Please let me know how to retrieve sequnces for these Ids. Would appreciate if someone tell me how to use seqkit. I have used seqkit like

seqkit grep -nrif remaining_except_core 307_DR_determinats.fasta but getting nothing from this command.

PERL awk Sed • 1.4k views
ADD COMMENT
0
Entering edit mode

Take a look at the Similar posts section on the right-hand side of the page for related questions and the corresponding solutions.

e.g. Retrieve multi-line fasta sequences using list of locus tag shows a very similar question and an accepted solution that you could try.

Also take a look at: https://bioinf.shenwei.me/seqkit/usage/#grep

ADD REPLY
0
Entering edit mode

If you can do this with a GUI application, then take a look at SEDA (https://www.sing-group.org/seda/).

You can apply the Pattern filtering operation (https://www.sing-group.org/seda/manual/operations.html#pattern-filtering) to the headers (check the Header radio button) using the sequence IDs you want (Note: you can use the Import patterns option to import these IDs from a TXT file instead of typing them manually into the GUI).

ADD REPLY
0
Entering edit mode
2.1 years ago
jena ▴ 290

Easy with seqmagick:

# if you have full IDs
seqmagick convert --include-from-file ids.txt input.fa output.fa

# if you have partial IDs, you can match patterns
seqmagick convert --pattern-include REGEX input.fa output.fa
ADD COMMENT

Login before adding your answer.

Traffic: 1837 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6