How to extract a protein sequence?
1
Hello.
How to extract a protein sequence?
I have two files.
< file 1. complete_protein.fasta >
protein_1
DCXSTEISLFHEIWLF
protein_2
AJFOWIDJLSIDJFJ
protein_3
DJFLWIDJFLSKDJFL
protein_4
DKSJFLEISJDKJF
< file 2. only proteinID.fasta >
protein_1
protein_4
I need about sequence in file2. That sequence have in file1.
So, I tried "diff" command, but result is I don't want data.
How to extract??
sequence
genome
gene
protein
• 915 views
Hi,
If you are working with linux distribution you can try the following command-line:
grep -f target_protein.txt -A1 protein_file.txt | sed '/--/d' > retrieved_protein.txt
This takes the protein_file.txt
file that contains the same content that you posted above:
protein_1
DCXSTEISLFHEIWLF
protein_2
AJFOWIDJLSIDJFJ
protein_3
DJFLWIDJFLSKDJFL
protein_4
DKSJFLEISJDKJF
And the file target_protein.txt
that contains the target protein names that you want to retrieve from the file above:
protein_1
protein_4
The output of protein target sequences of interest is saved in retrieved_protein.txt
, that looks like:
protein_1
DCXSTEISLFHEIWLF
protein_4
DKSJFLEISJDKJF
I hope this helps,
António
Login before adding your answer.
Traffic: 2734 users visited in the last hour
Answers here should work as well: Extract fasta sequences from a file using a list in another file.