Question: Extract N amino acids from fasta file
0
gravatar for martha.chapa.mc18
6 months ago by
martha.chapa.mc180 wrote:

Hi, I want to extract the first N aminoacids from sequences in a fasta file. I have this sequences,

>a47619p2-
MVKIALFGRNITLPILIFIGFVFLHDASAQTATVIDWDQIREASQTQRRQAAAIANAPVK
QGVVHEPIDAGVMAGNVPAEQRNAASIVQSIDGSKLSQISDRLPKFIKQGSDEVVYGKHV
VVSKLGPEVIGLILDLIKAQPANRALLLAKLQAISNDGNPEASNFMGFVFEYGLFGAVKN

for example, I want this sequence with only 30 aa, like:

>a47619p2-
MVKIALFGRNITLPILIFIGFVFLHDASAQ

Is there a program that can do this to all sequences in linux terminal? I hope you can help me. Thank you.

cut fasta file sequence • 248 views
ADD COMMENTlink modified 6 months ago by Joe17k • written 6 months ago by martha.chapa.mc180

You could convert to tabular format with seqkit and use the substring function from awk:

seqkit fx2tab file.fasta  | awk -v FS="\t" '{print ">"$1"\n"substr($2,1,30)}'
ADD REPLYlink modified 6 months ago • written 6 months ago by alex.zaccaron170
1

seqkit subseq -r 1:20 is enough.

ADD REPLYlink written 6 months ago by shenwei3565.2k

Be careful! This approach makes a lot of assumptions about the structure of the FASTA file.

ADD REPLYlink written 6 months ago by Alex Reynolds30k

Yes, it does. Sorry, I thought the input file was tabular format. I updated the comment.

ADD REPLYlink modified 6 months ago • written 6 months ago by alex.zaccaron170
1
gravatar for finswimmer
6 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Using seqkit:

$ seqkit subseq -r 1:30 input.fasta
ADD COMMENTlink written 6 months ago by finswimmer13k
0
gravatar for yztxwd
6 months ago by
yztxwd380
Southern Medical University
yztxwd380 wrote:
awk '{if(/>.*/) {print} else {print substr($0, 1, 30)} }' test.fa

test.fa

>a47619p2-
MVKIALFGRNITLPILIFIGFVFLHDASAQTATVIDWDQIREASQTQRRQAAAIANAPVKQGVVHEPIDAGVMAGNVPAEQRNAASIVQSIDGSKLSQISDRLPKFIKQGSDEVVYGKHVVVSKLGPEVIGLILDLIKAQPANRALLLAKLQAISNDGNPEASNFMGFVFEYGLFGAVKN

output

>a47619p2-
MVKIALFGRNITLPILIFIGFVFLHDASAQ
ADD COMMENTlink modified 6 months ago • written 6 months ago by yztxwd380
0
gravatar for Joe
6 months ago by
Joe17k
United Kingdom
Joe17k wrote:

With biopython:

#Usage: python3 scriptname.py file.fasta
import sys
from Bio import SeqIO

for i in SeqIO.parse(sys.argv[1], "fasta"):
    print(f">{i.description}\n{i.seq[0:30]}")

Or as a one-liner:

$ python3 -c 'import sys; from Bio import SeqIO; [print(f">{i.description}\n{i.seq[0:30]}") for i in SeqIO.parse(sys.argv[1], "fasta")];' file.fasta

Replace [0:30] with whatever range you like (it doesn't have to start at zero either).

ADD COMMENTlink written 6 months ago by Joe17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1702 users visited in the last hour