Question: converting fasta files to genbank or embl format
1
gravatar for samuel.medi
4.7 years ago by
samuel.medi10
United States
samuel.medi10 wrote:

I have 7000 genes and thier proteins as well as  the genome of a bacteria am working on, i want to convert these files into either genebank format or embl, my problem is i dont have any scripting skills, i tried using an online tool (http://genome.nci.nih.gov/tools/reformat.html) but its out put appears to lack some information.

does any one know a tool or a way i can convert these sequences?

genbank format fasta • 5.4k views
ADD COMMENTlink modified 4.7 years ago by Brice Sarver2.9k • written 4.7 years ago by samuel.medi10
1

I can develop this script for you. Give me original file format and final desired.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by YOT20
1

These are the files...

and this is how i want them to look like, NB-just the format

ADD REPLYlink modified 13 days ago by RamRS24k • written 4.6 years ago by mwanerhi erfgtr30

final format should be similar to this http://www.pseudomonas.com/downloads/pseudomonas/genbank/NC_002516.gbk.

original file is like this:- these are just examples of formats i have  and the one i want to get -above

 Protein fasta-http://www.pseudomonas.com/downloads/pseudomonas/fasta/Pseudomonas_aeruginosa_2192_uid54357.faa

Chromosome fasta- http://www.pseudomonas.com/downloads/pseudomonas/fasta/NC_002516.fna

thank you for the quick response

ADD REPLYlink written 4.7 years ago by samuel.medi10

Ok so let me see if I got your need,

You have a file, I did understand if you have protein or DNA or RNA.

You have you file in this format (protein)

### Amino acid sequences for Pseudomonas aeruginosa 2192 proteins. ### Last updated on 2011-04-11.

PA2G_00002|hypothetical protein[Pseudomonas aeruginosa 2192] MASPAFMRFLPRCGAAAAFGTLLGLAGCQSWLDDRYAD ....

PA2G_00002|hy .....

And want to convert to this format (DNA):

>PA2G_00002|hypothetical protein[Pseudomonas aeruginosa 2192]
TTTAAAGAGACCGGCGATTCTAGTGAAATCGAACGGGCAGGTCAATTTC
CAACCAGCGATGACGTAATAGATAGATACAAGGAAGTCATTTTTCTTTTA
AAGGATAGAAACGGTTAATGCTCTTGGGACGGCGCTTTTCT

That means:

You want convert protein to DNA? Is it ?

Or

  1. you want cut all lines whit hash tag #
  2. Clear spaces between lines
  3. and format 70 columns
ADD REPLYlink modified 13 days ago by RamRS24k • written 4.7 years ago by YOT20
1

please, don't add a new answer but update your 1st answer.

ADD REPLYlink written 4.7 years ago by Pierre Lindenbaum122k

I think you didn't get the question: Samuel needs to map the data DNA/protein vs a whole genome (using E.g: BLAST) and generate a genbank file from the output.

ADD REPLYlink written 4.7 years ago by Pierre Lindenbaum122k

"please, don't add a new answer but update your 1st answer." Ok Pierre. I think Samuel need something like this http://genome.nci.nih.gov/cgi-bin/gau/reformat. Only conversion. If the need is to comparing do y think parwise could solve? http://www.ebi.ac.uk/Tools/psa/genewise/

ADD REPLYlink written 4.7 years ago by YOT20
2
gravatar for Brice Sarver
4.7 years ago by
Brice Sarver2.9k
United States
Brice Sarver2.9k wrote:

No need to develop tools to do this. Many are publicly available for such a common task.

The standard for many years has been Emboss' SeqRet. An online version is here, but I would consider installing the suite if this is something you need to do often. The command line version is as simple as seqret <in> <out>. Wrap it in a loop.

BioPython's SeqIO module can also do this, albeit with a bit more (basic) programming. I'm sure there are equivalents in BioPerl, BioRuby, and via Bioconductor for R.

This kind of task is day one bioinformatics, and the skills required are easy to learn and very straightforward. It's as simple as navigating to a folder and running a program, possibly within the simplest of loops depending on how your data is organized. You are already converting 7000 sequences; it would make sense that you learn about the plethora of resources available to you developed over the past couple decades. You'll also save a ton of time in the future!

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Brice Sarver2.9k

sounds like the way to go, get dirty with biopython/python, although  i tried the Emboss' seqRet too, its output was rejected when i tried to identify genomic islands using http://www.pathogenomics.sfu.ca/islandviewer/genome_submit.php.

ADD REPLYlink written 4.7 years ago by samuel.medi10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1450 users visited in the last hour