Question: How to extract a protein sequence?
1
gravatar for ioer0417
1 day ago by
ioer041720
ioer041720 wrote:

Hello.

How to extract a protein sequence?

I have two files.

< file 1. complete_protein.fasta >

protein_1

DCXSTEISLFHEIWLF

protein_2

AJFOWIDJLSIDJFJ

protein_3

DJFLWIDJFLSKDJFL

protein_4

DKSJFLEISJDKJF

< file 2. only proteinID.fasta >

protein_1

protein_4

I need about sequence in file2. That sequence have in file1.

So, I tried "diff" command, but result is I don't want data.

How to extract??

protein sequence gene genome • 63 views
ADD COMMENTlink modified 1 day ago by antonioggsousa200 • written 1 day ago by ioer041720

Answers here should work as well: Extract fasta sequences from a file using a list in another file.

ADD REPLYlink written 1 day ago by genomax85k
7
gravatar for antonioggsousa
1 day ago by
antonioggsousa200 wrote:

Hi,

If you are working with linux distribution you can try the following command-line:

grep -f target_protein.txt -A1 protein_file.txt | sed '/--/d'  > retrieved_protein.txt

This takes the protein_file.txt file that contains the same content that you posted above:

protein_1
DCXSTEISLFHEIWLF
protein_2
AJFOWIDJLSIDJFJ
protein_3  
DJFLWIDJFLSKDJFL
protein_4
DKSJFLEISJDKJF
  

And the file target_protein.txt that contains the target protein names that you want to retrieve from the file above:

protein_1
protein_4
  

The output of protein target sequences of interest is saved in retrieved_protein.txt, that looks like:

protein_1
DCXSTEISLFHEIWLF
protein_4
DKSJFLEISJDKJF
  

I hope this helps,

António

ADD COMMENTlink modified 1 day ago • written 1 day ago by antonioggsousa200
1

Thank you so much!

Your answer helped me a lot.

ADD REPLYlink written 1 day ago by ioer041720
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 929 users visited in the last hour