ptt and rnt files that were created from assembly rather than a genome
1
0
Entering edit mode
7 months ago
langziv ▴ 50

Hi

In order to use Rockhopper I created ptt and rnt files from the GenBank file of the bacterium's strain I work with. Since this strain has an assembly instead of a genome, the ptt and rnt files are created differently than files that are created from a genome, and have a different format.

When I run Rockhopper with these files error messages were printed. Here are some example lines:

Error - expecting 9 columns of gene information but found less than 9:  278..1100       +       822     -       AFK73_25870     AFK73_25870     -       hypothetical protein
Error - expecting 9 columns of gene information but found less than 9:  1494..1980      -       486     -       AFK73_25875     AFK73_25875     -       lytic transglycosylase
Error - expecting 9 columns of gene information but found less than 9:  2413..2806      +       393     -       AFK73_25880     AFK73_25880     -       conjugal transfer protein TraM

I tried creating the files with the script that's in this post, as well as with Edge-pro's documentation. Did anyone face this problem and managed to solve it?

rockhopper genbank • 462 views
ADD COMMENT
0
Entering edit mode
7 months ago
shelkmike ★ 1.2k

You need to add an additional column with "-" before the last column. For example, here are the first 10 lines of my .ptt file which I successfully used with Rockhopper:

NS - 1..5637360
5427 proteins
Location    Strand  Length  PID Gene    Synonym Code    COG Product
2816432..2818867    +   812 -   -   KR76_13805  -   -   Membrane alanine aminopeptidase N
1408840..1410318    -   493 -   -   KR76_06980  -   -   hypothetical protein
704477..705019  +   181 -   -   KR76_03405  -   -   putative multidomain membrane protein
1..1623 +   541 -   -   KR76_00005  -   -   Chromosomal replication initiator protein DnaA
2108..3166  +   353 -   -   KR76_00010  -   -   DNA polymerase III beta subunit
3246..4127  +   294 -   -   KR76_00015  -   -   6-phosphogluconate dehydrogenase,decarboxylating
4128..4541  -   138 -   -   KR76_00020  -   -   hypothetical protein
ADD COMMENT
0
Entering edit mode

Thanks. I'll try. I'm very skeptic because of the the files' structure. For instance:

Klebsiella pneumoniae strain B199 plasmid unnamed 4 scaffold_39, whole genome shotgun sequence 
- 0..10952
10 proteins
Location    Strand  Length  PID Gene    Synonym Code    COG Product
0..194  -   194 -   AFK73_28005 AFK73_28005 -   transposase
472..1132   +   660 -   AFK73_28010 AFK73_28010 -   chloramphenicol acetyltransferase
1332..1710  -   378 -   AFK73_28015 AFK73_28015 -   acetyltransferase
2020..3025  -   1005    -   AFK73_28020 AFK73_28020 -   transposase
3103..6070  -   2967    -   AFK73_28025 AFK73_28025 -   transposase
6072..6633  -   561 -   AFK73_28030 AFK73_28030 -   hypothetical protein
6758..7037  -   279 -   AFK73_28035 AFK73_28035 -   transposase
7173..7353  +   180 -   AFK73_28040 AFK73_28040 -   transcriptional regulator
7311..8292  -   981 -   AFK73_28045 AFK73_28045 -   integrase
8689..10039 -   1350    -   AFK73_28055 AFK73_28055 -   DNA polymerase
Klebsiella pneumoniae strain B199 plasmid unnamed 5 scaffold_45, whole genome shotgun sequence - 0..5834
9 proteins
Location    Strand  Length  PID Gene    Synonym Code    COG Product
717..1089   -   372 -   AFK73_28270 AFK73_28270 -   hypothetical protein
1483..1828  +   345 -   AFK73_28275 AFK73_28275 -   mobilization protein
2009..2255  -   246 -   AFK73_28280 AFK73_28280 -   hypothetical protein
2505..3015  +   510 -   AFK73_28285 AFK73_28285 -   mobilization protein
3021..3234  +   213 -   AFK73_28290 AFK73_28290 -   mobilization protein
4108..4375  +   267 -   AFK73_28295 AFK73_28295 -   addiction module toxin RelE
4444..4795  -   351 -   AFK73_28300 AFK73_28300 -   hypothetical protein
4860..5406  -   546 -   AFK73_28305 AFK73_28305 -   hypothetical protein
5431..5758  -   327 -   AFK73_28310 AFK73_28310 -   hypothetical protein

As can be seen, there are multiple entrees instead of one long table (due to the reference being an assembly rather than a genome).

ADD REPLY

Login before adding your answer.

Traffic: 1331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6