How To Have Only Id And Sequence Printed In Output After Parse Sequences From A Fasta File
3
1
Entering edit mode
12.3 years ago
KJ Lim ▴ 140

I used this piece of biopython codes written by Eric Normandeau to parse several sequences from a Fasta file. The codes worked nicely. I would like to have my parsed output from this format

>DR179241 similar to UniRef100_A5HIY3 Cluster: Thaumatin-like protein; n=1; ...... ATAATCTTTAGATCAGTCATCAATCTCAACAGTATCGCTTTCAATTCTCTTTCATATTGC ATGGAAGTGTGTAAATACAATTAGGGCATTCATTGAGTTGACTTCATTTAAGCGCT......

exactly like this format:

>DR179241
ATAATCTTTAGATCAGTCATCAATCTCAACAGTATCGCTTTCAATTCTCTTTCATATTGC
ATGGAAGTGTGTAAATACAATTAGGGCATTCATTGAGTTGACTTCATTTAAGCGCT......

Could anyone kindly please light me for example how to modify this piece of biopython codes written by Eric Normandeau in order to suit my purpose?

Thank you very much and have a nice day.

biopython parsing fasta • 4.2k views
ADD COMMENT
3
Entering edit mode
12.3 years ago

You could do this in R, using seqinR:

library(seqinr)
x=read.fasta("yourfile.fasta")
write.fasta(x,file="newfile.fasta",names=names(x))
ADD COMMENT
0
Entering edit mode

Oohh, good to know. Thanks Leonor.

ADD REPLY
3
Entering edit mode
12.3 years ago
Peter 6.0k

How about this (untested):

#!/usr/bin/env python
import sys
from Bio import SeqIO

fasta_file = sys.argv[1]  # Input fasta file
result_file = sys.argv[2] # Output fasta file

def modified(records):
    for record in records:
        #Clear the description
        record.description=""
        yield record

records = modified(SeqIO.parse(fasta_file),"fasta")
count = SeqIO.write(records, result_file, "fasta")
print "Converted %i records" % count

This used a generator function to modify the records (memory efficient - you should avoid loading everything into memory). The key point for your request was to set the description to an empty string.

ADD COMMENT
0
Entering edit mode

Thanks Peter. Thanks for the suggestion.

ADD REPLY
0
Entering edit mode
10.2 years ago

If you are not bound to (bio)python I would just use sed:

sed 's/\s.*//' seq.fa

Explanation: From each line remove everything following and including the first blank space.

Sequence lines remain unchanged as they should not contain spaces (they get truncated otherwise!)

ADD COMMENT

Login before adding your answer.

Traffic: 2216 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6