Question

converting fasta files to genbank or embl format

1

Entering edit mode

9.3 years ago

samuel.medi ▴ 10

I have 7000 genes and their proteins as well as the genome of a bacteria am working on, I want to convert these files into either genebank format or embl, my problem is I don't have any scripting skills, I tried using an online tool (http://genome.nci.nih.gov/tools/reformat.html) but its out put appears to lack some information.

Does any one know a tool or a way I can convert these sequences?

format fasta genbank • 13k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by samuel.medi ▴ 10

2

Entering edit mode

I can develop this script for you. Give me original file format and final desired.

ADD REPLY • link 9.3 years ago by YOT ▴ 30

1

Entering edit mode

These are the files...

and this is how i want them to look like, NB-just the format

ADD REPLY • link updated 4.6 years ago by Ram 43k • written 9.2 years ago by mwanerhi erfgtr ▴ 30

0

Entering edit mode

Final format should be similar to this: http://www.pseudomonas.com/downloads/pseudomonas/genbank/NC_002516.gbk

Original file is like this:- these are just examples of formats I have and the one I want to get -above

Protein fasta - http://www.pseudomonas.com/downloads/pseudomonas/fasta/Pseudomonas_aeruginosa_2192_uid54357.faa

Chromosome fasta - http://www.pseudomonas.com/downloads/pseudomonas/fasta/NC_002516.fna

Thank you for the quick response

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by samuel.medi ▴ 10

0

Entering edit mode

Ok so let me see if I got your need,

You have a file, I did understand if you have protein or DNA or RNA.

You have you file in this format (protein)

### Amino acid sequences for Pseudomonas aeruginosa 2192 proteins. ### Last updated on 2011-04-11.

PA2G_00002|hypothetical protein[Pseudomonas aeruginosa 2192] MASPAFMRFLPRCGAAAAFGTLLGLAGCQSWLDDRYAD ....

PA2G_00002|hy .....

And want to convert to this format (DNA):

>PA2G_00002|hypothetical protein[Pseudomonas aeruginosa 2192]
TTTAAAGAGACCGGCGATTCTAGTGAAATCGAACGGGCAGGTCAATTTC
CAACCAGCGATGACGTAATAGATAGATACAAGGAAGTCATTTTTCTTTTA
AAGGATAGAAACGGTTAATGCTCTTGGGACGGCGCTTTTCT

That means:

You want convert protein to DNA? Is it ?

Or

you want cut all lines whit hash tag #
Clear spaces between lines
and format 70 columns

ADD REPLY • link updated 4.6 years ago by Ram 43k • written 9.3 years ago by YOT ▴ 30

1

Entering edit mode

please, don't add a new answer but update your 1st answer.

ADD REPLY • link 9.3 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I think you didn't get the question: Samuel needs to map the data DNA/protein vs a whole genome (using E.g: BLAST) and generate a genbank file from the output.

ADD REPLY • link 9.3 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

please, don't add a new answer but update your 1st answer.

Ok Pierre. I think Samuel need something like this http://genome.nci.nih.gov/cgi-bin/gau/reformat. Only conversion. If the need is to comparing do y think parwise could solve? http://www.ebi.ac.uk/Tools/psa/genewise/

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by YOT ▴ 30

Ram · Accepted Answer · 2015-01-12

No need to develop tools to do this. Many are publicly available for such a common task.

The standard for many years has been Emboss' SeqRet. An online version is here, but I would consider installing the suite if this is something you need to do often. The command line version is as simple as seqret <in> <out>. Wrap it in a loop.

BioPython's SeqIO module can also do this, albeit with a bit more (basic) programming. I'm sure there are equivalents in BioPerl, BioRuby, and via Bioconductor for R.

This kind of task is day one bioinformatics, and the skills required are easy to learn and very straightforward. It's as simple as navigating to a folder and running a program, possibly within the simplest of loops depending on how your data is organized. You are already converting 7000 sequences; it would make sense that you learn about the plethora of resources available to you developed over the past couple decades. You'll also save a ton of time in the future!