Help creating .sqn file using tbl2asn to submit multiple sequences to Genbank
2
0
Entering edit mode
8.2 years ago
jolespin ▴ 130

I think this post is very relevant to many bioinformaticians who are submitting to Genbank using tbl2asn

I've been following the guidelines here

I've successfully installed tbl2asn on my Mac and have been using through the terminal

The directions say to create 3 files: template.sbt, table.tbl, and fasta.fsa

My fasta format headers look like:

>TCONS_00001810 [organism=Mus musculus] [strain=C57BL/6J] [chromosome=1] olfactory receptor 1415 (Olfr1415) mRNA, complete cds


The corresponding data in the table file looks like:

>Feature TCONS_00001810
1    3422    mRNA
1    186    5'UTR
187    1122    CDS
1123    3422    3'UTR
1    176    exon
177    2079    exon


The template file isn't a text file so I can't provide an example . . .

In terminal, I've opened up tbl2asn and I know it's working because when I do the command:

tbl2asn -


it gives me all of the different commands that I can use.

When I run this command, it works and creates a file in my directory with the template.sbt, table.tbl, and fasta.fsa called errorsummary.val. However, this file is empty (zero bytes). It should create a .sqn file which combines the 3 preliminary files i've described earlier.

tbl2asn -t template.sbt -p . -j "[organism=Mus musculus] [strain=C57BL:6J]" -V vb -a s


The documentation explains -t, -p, -j, -V, and -a

-p specifies the path for the table and sequence files [required]
-t specifies the template file (including the path) [required]
-j allows the addition of source qualifiers that will be the same for each submission
Example: -j "[organism=Saccharomyces cerevisiae] [strain=S288C]"
-V is a verification command when used in conjunction with v (strongly suggested), which will tell the computer to run a validation step to insure that there are no errors in your submission.

This validation step will generate a report (with suffix .val) for each .fsa file and place it in the same directory that houses the data files and tables used in the submission.

If you add a b command (optional) following the v command, the computer will generate a GenBank flat file (.gbf) of your submission and deposit it in the same directory that houses the data files and tables used in the submission. Note that .gbf files are not suitable for submission. They are only to view the file in GenBank flatfile format. The -a command used in conjunction with the s command instructs tbl2asn to read multiple FASTA components in one file as a set of unrelated sequences. This creates a single file of multiple submissions.

Why does the program run, not give any errors, create the errorsummary.val and not create the .sqn file?

How can I get this to work? I feel like I'm very close.

I've already established a working directory which is where all of those files are located.

I've tried to put the directory location after -p, taking out the -j and modifiers, and [optional] commands. Still can't get it to work.

submit batch tbl2asn sequence genbank • 9.0k views
0
Entering edit mode

What are the contents of the errorsummary.val file? is it empty or does it give any information about what's going on

0
Entering edit mode

its completely empty . do you have any idea how to submit this to Genbank? What other reasons could there be for why it isn't working .

0
Entering edit mode

hello,

I am trying to prepare files for a TSA submission. I prepared the .fsa and .sbt files. I was wondering if you know how to prepare the .tbl file that contains the annotation?

Federico

0
Entering edit mode

Also, I'm too getting an empty .val file

0
Entering edit mode

Did you ever resolve this issue? I'm having the same problem with a whole genome dataset, and I can't figure out what's going wrong.

0
Entering edit mode
8.2 years ago

Hi,I am also having a problem creating an sqn file for Genbank submission.

I am attempting to submit a full genome to Genbank. I am using tbl2asn to generate an sqn file for submission from the velvet contigs.fa but I am running into two difficulties.

The command I am using looks like this:

./mac.tbl2asn -i 1383.fsa -t 1383.sbt -j "[organism=Cronobacter sakazakii 1] [strain=1] [host=unknown] [country=UK] [collection_date=1950] [isolation-source=milk powder] [note=multilocus sequence type 4] [gcode=11]" -M n -Z discrep -a r10k

1. The raw velvet output generates no output whatsoever.
2. Believing this might be related to the FASTA headers I wrote a script to replace this with ">contig0001" etc. (and to filter out scaffolds <200nt). The sqn file is now generated but looks like this:

inst {
repr delta ,
mol dna ,
length 525416 ,
ext
delta {
literal {
length 355416,
seq-data
ncbi2na 'CA5E5949BD0DE2538D01DF43F02FE39D020F01F6C383C6E
F477E21F8C39FFDB7E61B808CD2E558B951123EDE303EF0224B986697925E7B662BA6CCF19FD77
F48B42773F89FF77D215867982E3DBC996ED8E8A64F32A25E2223A426B0CE0000E1859D97FE16F
197BFF566FF8E978A4BDE429CF49152D259FFD67DF7BC5AFC5AC64524666F5C5EA69A69A5E4CDF
79FCD1514CC2099D337338232D199E2349395A79AFC692D1277A6019771659F5A3AE68430C8C5B
6536301D01EB63F4E04C16E3613C32E5366B6200325A87E3D94522C75E230BCE5972CD93BF57F8


and so on...

Any ideas?

0
Entering edit mode
5.6 years ago

This was probably down to the way the input files were named.

tbl2asn wants the file extension to be .fsa and won't find .fasta files.