How to extract a protein sequence?
1
1
Entering edit mode
3.8 years ago
ioer0417 ▴ 20

Hello.

How to extract a protein sequence?

I have two files.

< file 1. complete_protein.fasta >

protein_1

DCXSTEISLFHEIWLF

protein_2

AJFOWIDJLSIDJFJ

protein_3

DJFLWIDJFLSKDJFL

protein_4

DKSJFLEISJDKJF

< file 2. only proteinID.fasta >

protein_1

protein_4

I need about sequence in file2. That sequence have in file1.

So, I tried "diff" command, but result is I don't want data.

How to extract??

sequence genome gene protein • 915 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
7
Entering edit mode
3.8 years ago

Hi,

If you are working with linux distribution you can try the following command-line:

grep -f target_protein.txt -A1 protein_file.txt | sed '/--/d'  > retrieved_protein.txt

This takes the protein_file.txt file that contains the same content that you posted above:

protein_1
DCXSTEISLFHEIWLF
protein_2
AJFOWIDJLSIDJFJ
protein_3  
DJFLWIDJFLSKDJFL
protein_4
DKSJFLEISJDKJF
  

And the file target_protein.txt that contains the target protein names that you want to retrieve from the file above:

protein_1
protein_4
  

The output of protein target sequences of interest is saved in retrieved_protein.txt, that looks like:

protein_1
DCXSTEISLFHEIWLF
protein_4
DKSJFLEISJDKJF
  

I hope this helps,

António

ADD COMMENT
1
Entering edit mode

Thank you so much!

Your answer helped me a lot.

ADD REPLY

Login before adding your answer.

Traffic: 2734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6