How are transcriptome annotations published?
0
1
Entering edit mode
5 months ago
Dunois ★ 1.5k

I see plenty of papers describing annotations of de novo assembled transcriptomes, but I can't really seem to put a finger on the accepted method of publishing an annotated transcriptome.

For instance, this paper uploaded their data to figshare. But this paper only mentions a BioProject ID.

So my questions:

• How does one publish an annotation of a transcriptome?
• Is there any criteria on what files need to be uploaded? I presume a FASTA with the sequences and a table of some sort relating the headers to human-readable annotations would be the bare minimum. Or is a FASTA file with human-readable headers sufficient? Is some sort of a "raw" assembly FASTA file also necessary?
• What else can one publish as a part of a transcriptome annotation?
• What platforms are considered suitable for publishing the annotation data? GitHub? figshare?

Your inputs would be much appreciated.

RNAseq annotation publish transcriptome • 785 views
1
Entering edit mode

Do you only have transcriptome annotation with no genome sequence? If you do you may be able to use Eukaryotic genome annotation submission method for GenBank. You can also email NCBI support and ask them about options.

0
Entering edit mode

Thanks for the links to the papers/repos. Wouldn't it be enough to upload a .fasta, ORFs and annotation in .gff3? Good luck!

0
Entering edit mode

I don't know, that's why I'm asking. Like what do you do if you're transferring annotations off of multiple resources?

0
Entering edit mode

Have you thought about releasing the transcriptome as a SQLite-database? This way, you could bundle everything into one DB. Look at this publication. I'll try it myself for two brassicaceans.

0
Entering edit mode

That's definitely one idea. I'm not convinced releasing the data as a database of some sort is any better than putting it out as a simple flatfile though.

One of my concerns is also about where to put the data. I see that this publication you shared just put the annotations file in the "Additional Files" section.

1
Entering edit mode

How about zenodo? You can store up to 50GB there. If (gigantic if) we'll get around to publish our thing, stuff may be hosted there.

1
Entering edit mode

Zenodo seems like a good option, I've seen other papers use it. I'm just also loathe to create a new account just to upload some data. Likewise, I don't want to give Amazon/Google/Microsoft any data either.

I wouldn't mind putting the data on GitHub if they'd let me get away with it. (This does exist.)

I'm also considering some open source solutions (e.g., own Nextcloud server).

1
Entering edit mode

Few people are likely to go the extra step to get data/annotation from an external repository/site etc, especially if it requires them to create an account on the site. You should look at working with NCBI/ENA to see if they can incorporate your annotation (or you can amend the annotations their pipelines produce). Otherwise your work may not be appreciated/used at all.

0
Entering edit mode

I'd definitely upload the raw data and the assemblies to NCBI but probably not the annotations. The NCBI submission process for transcriptome annotations seems pretty tedious GenoMax . (I had reached out to them following your suggestion earlier.)

I do concur with you though. The annotations are useless if nobody can access them conveniently.

1
Entering edit mode

git-lfs is capped at 1GB though...

0
Entering edit mode

That's for the free version though; the paid version isn't too expensive. But your point stands.

Zenodo seems like the better option here.