Question

How To Have Only Id And Sequence Printed In Output After Parse Sequences From A Fasta File

1

Entering edit mode

12.4 years ago

KJ Lim ▴ 140

I used this piece of biopython codes written by Eric Normandeau to parse several sequences from a Fasta file. The codes worked nicely. I would like to have my parsed output from this format

>DR179241 similar to UniRef100_A5HIY3 Cluster: Thaumatin-like protein; n=1; ...... ATAATCTTTAGATCAGTCATCAATCTCAACAGTATCGCTTTCAATTCTCTTTCATATTGC ATGGAAGTGTGTAAATACAATTAGGGCATTCATTGAGTTGACTTCATTTAAGCGCT......

exactly like this format:

>DR179241
ATAATCTTTAGATCAGTCATCAATCTCAACAGTATCGCTTTCAATTCTCTTTCATATTGC
ATGGAAGTGTGTAAATACAATTAGGGCATTCATTGAGTTGACTTCATTTAAGCGCT......

Could anyone kindly please light me for example how to modify this piece of biopython codes written by Eric Normandeau in order to suit my purpose?

Thank you very much and have a nice day.

biopython parsing fasta • 4.3k views

ADD COMMENT • link updated 10.3 years ago by dariober 15k • written 12.4 years ago by KJ Lim ▴ 140

0

Entering edit mode

10.3 years ago

dariober 15k

If you are not bound to (bio)python I would just use sed:

sed 's/\s.*//' seq.fa

Explanation: From each line remove everything following and including the first blank space.

Sequence lines remain unchanged as they should not contain spaces (they get truncated otherwise!)

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 10.3 years ago by dariober 15k

Ram · Accepted Answer · 2012-06-12

3

Entering edit mode

12.4 years ago

Leonor Palmeira 3.9k

You could do this in R, using seqinR:

library(seqinr)
x=read.fasta("yourfile.fasta")
write.fasta(x,file="newfile.fasta",names=names(x))

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 12.4 years ago by Leonor Palmeira 3.9k

0

Entering edit mode

Oohh, good to know. Thanks Leonor.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 12.4 years ago by KJ Lim ▴ 140

Ram · Accepted Answer · 2012-06-12

3

Entering edit mode

12.4 years ago

Peter 6.0k

How about this (untested):

#!/usr/bin/env python
import sys
from Bio import SeqIO

fasta_file = sys.argv[1]  # Input fasta file
result_file = sys.argv[2] # Output fasta file

def modified(records):
    for record in records:
        #Clear the description
        record.description=""
        yield record

records = modified(SeqIO.parse(fasta_file),"fasta")
count = SeqIO.write(records, result_file, "fasta")
print "Converted %i records" % count

This used a generator function to modify the records (memory efficient - you should avoid loading everything into memory). The key point for your request was to set the description to an empty string.

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 12.4 years ago by Peter 6.0k

0

Entering edit mode

Thanks Peter. Thanks for the suggestion.

ADD REPLY • link 12.4 years ago by KJ Lim ▴ 140