Question: Best Practices For Naming De Novo Transcriptome Sequences?
gravatar for johnstantongeddes
5.4 years ago by
Burlington, VT
johnstantongeddes410 wrote:

What are best practices for naming fasta sequences from a de novo transcriptome assembly?

Specifically, I'm thinking about

  • naming sequence that is intuitive and logical
  • forward-compatibility in the case of future genome-sequencing or more RNAseq
  • useful for other researchers performing data-base mining or such

I realize this may just end-up being project specific, but I'm hoping to avoid the problem of unstructured text in biological databases down the road.

fasta transcriptome • 1.6k views
ADD COMMENTlink modified 3.5 years ago by Biostar ♦♦ 20 • written 5.4 years ago by johnstantongeddes410
gravatar for Damian Kao
5.4 years ago by
Damian Kao15k
Damian Kao15k wrote:

As long as you delimited your headers correctly, any future manipulations to conform to another format should be easy. I would make sure to:

  • Choose a sensible delimiter. Obviously something you will not use in your meta-data. Characters like tabs or pipes are used commonly.
  • Have the same amount of delimited meta-data for each header
  • If certain meta-data is not applicable or available, make sure to put an empty place-holder like "NA" or something
  • If you have incrementing numbers, pad the numbers with starting zeroes so all numbers have the same string length. For example: 00001, 00002, 01234, 12345
ADD COMMENTlink written 5.4 years ago by Damian Kao15k

Padding zeroes certainly important!

ADD REPLYlink written 5.4 years ago by johnstantongeddes410
gravatar for Ann
5.4 years ago by
Concord NC USA
Ann2.2k wrote:

My advice: Create names that can be easily parsed. For example, if your de novo assembly generates multiple transcript variants per locus, then use ".N" suffixes to indicate alternative transcripts coming from the same gene. And if you intend to make the sequences available as part of a searchable Web site, use names that are likely to be unique to your species. For example, for Vaccinium corymbosum (blueberry) you might do something like:

Vc1.1 for gene Vc1, transcript 1.

Do a quick google search to find out what your proposed names will bring up.

ADD COMMENTlink written 5.4 years ago by Ann2.2k

Sensible approach to dealing with transcripts, though in the absence of a genome this is one of the aspects of de novo transcriptome assembly I'm least comfortable with.

ADD REPLYlink written 5.4 years ago by johnstantongeddes410
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2417 users visited in the last hour