Extract N amino acids from fasta file
3
0
Entering edit mode
4.4 years ago

Hi, I want to extract the first N aminoacids from sequences in a fasta file. I have this sequences,

>a47619p2-
MVKIALFGRNITLPILIFIGFVFLHDASAQTATVIDWDQIREASQTQRRQAAAIANAPVK
QGVVHEPIDAGVMAGNVPAEQRNAASIVQSIDGSKLSQISDRLPKFIKQGSDEVVYGKHV
VVSKLGPEVIGLILDLIKAQPANRALLLAKLQAISNDGNPEASNFMGFVFEYGLFGAVKN

for example, I want this sequence with only 30 aa, like:

>a47619p2-
MVKIALFGRNITLPILIFIGFVFLHDASAQ

Is there a program that can do this to all sequences in linux terminal? I hope you can help me. Thank you.

fasta sequence • 1.9k views
ADD COMMENT
0
Entering edit mode

You could convert to tabular format with seqkit and use the substring function from awk:

seqkit fx2tab file.fasta  | awk -v FS="\t" '{print ">"$1"\n"substr($2,1,30)}'
ADD REPLY
1
Entering edit mode

seqkit subseq -r 1:20 is enough.

ADD REPLY
0
Entering edit mode

Be careful! This approach makes a lot of assumptions about the structure of the FASTA file.

ADD REPLY
0
Entering edit mode

Yes, it does. Sorry, I thought the input file was tabular format. I updated the comment.

ADD REPLY
1
Entering edit mode
4.4 years ago

Using seqkit:

$ seqkit subseq -r 1:30 input.fasta
ADD COMMENT
0
Entering edit mode
4.4 years ago
Jianyu ▴ 580
awk '{if(/>.*/) {print} else {print substr($0, 1, 30)} }' test.fa

test.fa

>a47619p2-
MVKIALFGRNITLPILIFIGFVFLHDASAQTATVIDWDQIREASQTQRRQAAAIANAPVKQGVVHEPIDAGVMAGNVPAEQRNAASIVQSIDGSKLSQISDRLPKFIKQGSDEVVYGKHVVVSKLGPEVIGLILDLIKAQPANRALLLAKLQAISNDGNPEASNFMGFVFEYGLFGAVKN

output

>a47619p2-
MVKIALFGRNITLPILIFIGFVFLHDASAQ
ADD COMMENT
0
Entering edit mode
4.4 years ago
Joe 21k

With biopython:

#Usage: python3 scriptname.py file.fasta
import sys
from Bio import SeqIO

for i in SeqIO.parse(sys.argv[1], "fasta"):
    print(f">{i.description}\n{i.seq[0:30]}")

Or as a one-liner:

$ python3 -c 'import sys; from Bio import SeqIO; [print(f">{i.description}\n{i.seq[0:30]}") for i in SeqIO.parse(sys.argv[1], "fasta")];' file.fasta

Replace [0:30] with whatever range you like (it doesn't have to start at zero either).

ADD COMMENT

Login before adding your answer.

Traffic: 2539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6