How are transcriptome annotations published?
0
1
Entering edit mode
3.0 years ago
Dunois ★ 2.5k

I see plenty of papers describing annotations of de novo assembled transcriptomes, but I can't really seem to put a finger on the accepted method of publishing an annotated transcriptome.

For instance, this paper uploaded their data to figshare. But this paper only mentions a BioProject ID.

So my questions:

  • How does one publish an annotation of a transcriptome?
  • Is there any criteria on what files need to be uploaded? I presume a FASTA with the sequences and a table of some sort relating the headers to human-readable annotations would be the bare minimum. Or is a FASTA file with human-readable headers sufficient? Is some sort of a "raw" assembly FASTA file also necessary?
  • What else can one publish as a part of a transcriptome annotation?
  • What platforms are considered suitable for publishing the annotation data? GitHub? figshare?

Your inputs would be much appreciated.

RNAseq annotation publish transcriptome • 1.8k views
ADD COMMENT
1
Entering edit mode

Do you only have transcriptome annotation with no genome sequence? If you do you may be able to use Eukaryotic genome annotation submission method for GenBank. You can also email NCBI support and ask them about options.

ADD REPLY
0
Entering edit mode

Thanks for the links to the papers/repos. Wouldn't it be enough to upload a .fasta, ORFs and annotation in .gff3? Good luck!

ADD REPLY
0
Entering edit mode

I don't know, that's why I'm asking. Like what do you do if you're transferring annotations off of multiple resources?

ADD REPLY
0
Entering edit mode

Have you thought about releasing the transcriptome as a SQLite-database? This way, you could bundle everything into one DB. Look at this publication. I'll try it myself for two brassicaceans.

ADD REPLY
0
Entering edit mode

That's definitely one idea. I'm not convinced releasing the data as a database of some sort is any better than putting it out as a simple flatfile though.

One of my concerns is also about where to put the data. I see that this publication you shared just put the annotations file in the "Additional Files" section.

What are you planning on doing about this?

ADD REPLY
1
Entering edit mode

How about zenodo? You can store up to 50GB there. If (gigantic if) we'll get around to publish our thing, stuff may be hosted there.

ADD REPLY
1
Entering edit mode

Zenodo seems like a good option, I've seen other papers use it. I'm just also loathe to create a new account just to upload some data. Likewise, I don't want to give Amazon/Google/Microsoft any data either.

I wouldn't mind putting the data on GitHub if they'd let me get away with it. (This does exist.)

I'm also considering some open source solutions (e.g., own Nextcloud server).

ADD REPLY
1
Entering edit mode

Few people are likely to go the extra step to get data/annotation from an external repository/site etc, especially if it requires them to create an account on the site. You should look at working with NCBI/ENA to see if they can incorporate your annotation (or you can amend the annotations their pipelines produce). Otherwise your work may not be appreciated/used at all.

ADD REPLY
0
Entering edit mode

I'd definitely upload the raw data and the assemblies to NCBI but probably not the annotations. The NCBI submission process for transcriptome annotations seems pretty tedious GenoMax . (I had reached out to them following your suggestion earlier.)

I do concur with you though. The annotations are useless if nobody can access them conveniently.

ADD REPLY
1
Entering edit mode

git-lfs is capped at 1GB though...

ADD REPLY
0
Entering edit mode

That's for the free version though; the paid version isn't too expensive. But your point stands.

Zenodo seems like the better option here.

ADD REPLY

Login before adding your answer.

Traffic: 1933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6