Question: Biopython: SeqIO.write () function to write dictionary object to fasta file
1
gravatar for WeepingMeadow
16 months ago by
WeepingMeadow10 wrote:

Hello,

I am trying to write a dictionary object to a FASTA file however I have problems with writing it. I could not achieve doing it without using the library or with the library (Biopython).

I tried converting my dictionary to list using "dict.items()" then writing it with SeqIOand error is:

"AttributeError: 'tuple' object has no attribute 'id'"

I would appreciate any kind of help. Thanks in advance!

WM

biopython sequence • 3.2k views
ADD COMMENTlink modified 16 months ago • written 16 months ago by WeepingMeadow10
2
gravatar for Eric Lim
16 months ago by
Eric Lim1.4k
Stoke Therapeutics, Inc
Eric Lim1.4k wrote:

The error is clear. Your dictionary is missing the id attribute, which is the required parameter to use SeqIO.write. Typically, you'd provide a SeqRecord object to it, which includes a Seq object with parameters like id and description. You can definitely turn your dictionary into a list of SeqRecord objects and everything else should work.

There are quite a few other ways to convert dictionary to the many formats SeqIO supports. The easiest (and the least programming experience required) is to simply write your dictionary into a tab-delimited file and use SeqIO.convert.

See below for an example.

from Bio import SeqIO

a = {'myseq1':'acgt', 'myseq2': 'gctc'}

# try writing your own code to turn this dictionary into a tab-delimited file (seq.tab), i.e
# myseq1  acgt
# myseq2  gctc

SeqIO.convert('seq.tab', 'tab', 'seq.fa', 'fasta')
ADD COMMENTlink modified 16 months ago • written 16 months ago by Eric Lim1.4k
1
gravatar for Joe
16 months ago by
Joe14k
United Kingdom
Joe14k wrote:

All you need to do is tell it to write the dictionary values in the SeqIO.write() call, i.e.:

Input file (seqs.fa):

>RandomSequence_3c0u91QYYaQ6aKHbB3SnPOAeQhQnk8xn
ATGACGACGTCTGCACCTCTTCAGCGAGGGTATGACCACGTTGGTCAGCCGGACCGAGCC
AATCGAGCTTGGTGGAACAA
>RandomSequence_CjetvXyAxJ5P1lQVcArbgNTvHpJHRvvv
CGATAGCAGCACACGGCGGGCCACCCATCATAGACTCCGGCGTTCAGGGCCGTATCAATT
GAGTCGAAGCTGAAACGTCA
>RandomSequence_vUqogYyedda55EPajRhHdQNHncrPmzc5
TTGAACTGGTGGACTATGCCGCCGAGGACGCCCGCGTAGAAATACCGTTCAACCTTTGCA
TCAATAAGAGTCAAATGTTA

Run through the following code:

from Bio import SeqIO

record_dict = SeqIO.to_dict(SeqIO.parse('seqs.fa', 'fasta'))

with open('output_fasta.fa', 'w') as handle:
    SeqIO.write(record_dict.values(), handle, 'fasta')

Yields this output file (output_fasta.fa): - spoiler alert, it's the same as the input file (duh!) :)

>RandomSequence_3c0u91QYYaQ6aKHbB3SnPOAeQhQnk8xn
ATGACGACGTCTGCACCTCTTCAGCGAGGGTATGACCACGTTGGTCAGCCGGACCGAGCC
AATCGAGCTTGGTGGAACAA
>RandomSequence_CjetvXyAxJ5P1lQVcArbgNTvHpJHRvvv
CGATAGCAGCACACGGCGGGCCACCCATCATAGACTCCGGCGTTCAGGGCCGTATCAATT
GAGTCGAAGCTGAAACGTCA
>RandomSequence_vUqogYyedda55EPajRhHdQNHncrPmzc5
TTGAACTGGTGGACTATGCCGCCGAGGACGCCCGCGTAGAAATACCGTTCAACCTTTGCA
TCAATAAGAGTCAAATGTTA
ADD COMMENTlink written 16 months ago by Joe14k

This is with the assumption that the OP obtained the dictionary directly from SeqIO where the dictionary's values() already has everything formatted nicely in SeqRecord. The attribute error suggested that the dictionary is likely from outside of the biopython's ecosystem. Your example is extremely educational for someone to learn how to construct SeqRecord objects from any key-value data structure by looking at the output of print(record_dict).

ADD REPLYlink modified 16 months ago • written 16 months ago by Eric Lim1.4k
0
gravatar for WeepingMeadow
16 months ago by
WeepingMeadow10 wrote:

Dear Eric Lim and jrj.healey,

Thank you for your answers however it seems to be that the problem is a little bit more complicated to solve hence I've been trying just to get the fasta output of my file for 2 days. I'm new to python but It took a lot more to solve this single problem than writing the whole code. I'm sorry if i could not be more specific and clear. To start, a piece from my code is below:

for seq_record in SeqIO.parse("/home/june/Desktop/snp/ssr/transcriptome.fasta", "fasta"):
    if seq_record.id in str(dicti.keys()).strip(">"):
        dicti[">" + seq_record.id] = dicti[">" + seq_record.id] + str(seq_record.seq)

In the end I get my output in the format of:

Transcript345: Motif: GC Total length: 20 CAAGGTCAGGCCTTCTTTATGCATGATAAGCACTGTGAGGACCCAGGGCAGCTTCAGTGATCATCAGGTGAGTTTAAGGTGGGGGGGGGGGGGCT

However I need it in the FASTA format.

And my dictionary is in the format of:

'>Transcript345': 'Motif: GC Total length: 20 CAAGGTCAGGCCTTCTTTATGCATG... ' #transcript id as the key and the rest as the value

Please ask me questions if i still am not clear enough.

Thanks in advance! WM

ADD COMMENTlink written 16 months ago by WeepingMeadow10
1

You should add a reply to your original question instead of replying as an answer.

And no, I'm not sure I follow what exactly you're trying to do and I'm quite certain str(dicti.keys()).strip(">")) isn't doing what you think it's doing.

Your code hinted you're trying to output a subset of transcriptome.fasta whose record ids match to your dictionary keys, but It's probably more helpful if you can post an example of your dictionary, a line or two of your input, and your desired output.

ADD REPLYlink written 16 months ago by Eric Lim1.4k

This should not be an answer.

Your post formatting is a little screwed, but if the format is what I think it is, how is it not in fasta format?

If you've got too many > characters, which it looks like you might have, just remove them from the string concatenation you're doing...

Please post more of the code and your input so we can see the issue more clearly.

ADD REPLYlink modified 16 months ago • written 16 months ago by Joe14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1212 users visited in the last hour