Convert all fasta files in one folder
1
0
Entering edit mode
2.0 years ago

Dear colleagues,

Could you please help to finish my code:

I take each file in the certain one folder and make from .fna file .json so that the name of .fna become the same name but as .json , in the same folder.

for example, was:

GCF_000783815.1_ASM78381v1_genomic.fna

become:

GCF_000783815.1_ASM78381v1_genomic.json
from Bio import SeqIO
import json
my_dict = {}
import os
_file = os.listdir("data/")
print(_file)
for EL in _file:
    with open(EL, 'r') as new_fasta:
        for x in SeqIO.parse(new_fasta, 'fasta'):
            my_dict = {
                "dataset": x.id,
                "sequence": str(x.seq)
            }
    with open('my_dict????.json', 'w') as f:
        json.dump(my_dict???, f)
fasta json • 1.5k views
ADD COMMENT
0
Entering edit mode

Check this out: How to rename multiple files on Linux. It would be something like:

   for i in *.fna; do mv -- "$i" "${i%.fna}.json"; done
ADD REPLY
1
Entering edit mode

@ Buffo It is not as simple as that. From python code, OP intention seems to generate a dictionary from id and sequence and then dump dictionary as json. Only part that has some problem is last two lines in python code. Appending items to dictionary is also confusing.

This is a xy problem. OP needs to clarify the objective of the python code and expected output.

ADD REPLY
0
Entering edit mode

thks, but I'm in Win working, could you pls advise if knows for this part:

with open('my_dict????.json', 'w') as f: json.dump(my_dict???, f)

ADD REPLY
2
Entering edit mode
2.0 years ago

Something like this, should create a json file for each fna file with the same base name. The json file will contain one json object per line:

import json

from Bio import SeqIO
from pathlib import Path

data_dir = Path("data")

for fasta in data_dir.glob("*.fna"):
    with Path(data_dir / f"{fasta.stem}.json").open(mode="w") as f:
        for record in SeqIO.parse(fasta, "fasta"):
            data = {
                "dataset": record.id,
                "sequence": str(record.seq)
            }
            f.write(json.dumps(data))
ADD COMMENT
0
Entering edit mode

thank you)

ADD REPLY
1
Entering edit mode

I would suggest in built dict function (https://biopython.org/DIST/docs/tutorial/Tutorial.html#sec%3Aseqio_todict):

from Bio import SeqIO

Seq_dict = SeqIO.to_dict(SeqIO.parse("input.fasta", "fasta"))

@ biology.may20 you have posted exactly same issue here: Fasta to json via python dictionary and @ Carambakaracho posted solution to same issue.

ADD REPLY
0
Entering edit mode

thanks for advise, previous post was concerning making the json from fasta via dict, and in this one problem was to make these .fna files for json in the same folder

ADD REPLY

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6