Explanation of definition lines for Trinity .fasta and .SuperTrans.fasta files
0
0
Entering edit mode
21 months ago
Melissa • 0

Hi folks,

I assembled a transcriptome in Trinity v2.8.5 using the --include_supertranscripts parameter. These are the deflines for the .fasta file:

>TRINITY_DN8_c3_g1_i1 len=330 path=[0:0-329]
>TRINITY_DN8_c1_g1_i1 len=271 path=[0:0-270]
>TRINITY_DN8_c2_g1_i1 len=357 path=[0:0-356]
>TRINITY_DN8_c0_g1_i4 len=2132 path=[0:0-1596 2:1597-1673 3:1674-1734 4:1735-1789 8:1790-1797 9:1798-1927 11:1928-2025 12:2026-2066 13:2067-2096 15:2097-2131]

These are the deflines for the .SuperTrans.fasta file:

>TRINITY_DN8_c1_g1
>TRINITY_DN8_c0_g1
>TRINITY_DN8_c0_g2
>TRINITY_DN8_c0_g3
>TRINITY_DN10_c1_g1
>TRINITY_DN10_c2_g1

To my understanding, g1 indicates gene 1, i1 indicates isoform 1 of gene 1, len is the length of the transcript and path indicates the nodes of the de Buijn graph transversed by the transcript. What is indicated by DN8, D10 and c0, c1, c2, etc? Ive been searching around but cannot find an explanation. Thank you!

Trinity fasta • 863 views
ADD COMMENT
0
Entering edit mode

Per ChatGPT

>TRINITY_DN1234_c0_g1_i1 len=548 path=[1234:0-547]

Let's break down the components of this header:

  • Identifier (TRINITY_DN1234_c0_g1_i1):
     TRINITY: Indicates that the sequence is from a Trinity assembly.
     DN1234: A unique identifier for the de novo assembly.
     c0_g1_i1: This part further breaks down the identifier. In this example, it means contig 0, gene 1, isoform 1.
    
  • Length Information (len=548):
     Indicates the length of the sequence (548 nucleotides in this example).
    
  • Path Information (path=[1234:0-547]):
    Provides information about the alignment or path in the assembly graph.
    [1234:0-547] indicates the component path in the assembly graph. This information can be useful for understanding how the transcript was reconstructed from the input RNA-Seq data.
    

For Supertranscript headers :

>TRINITY_DN1234_c0_g1_i1 len=1200 super=1 path=[1234:0-547 567:200-899]
  • Identifier (TRINITY_DN1234_c0_g1_i1):
     Same as in the regular Trinity assembly, providing a unique identifier for the supertranscript.
    
  • Length Information (len=1200):

     Indicates the length of the supertranscript (1200 nucleotides in this example).
    
  • Supertranscript Indicator (super=1):

     Indicates that this sequence is a supertranscript (1 means true). Supertranscripts are constructed by merging and optimizing the information from the constituent transcripts.
    
  • Path Information (path=[1234:0-547 567:200-899]):

    Describes the constituent transcripts that contribute to the supertranscript.
    In this example, it indicates that the supertranscript is composed of two segments: one from the path [1234:0-547] and another from the path [567:200-899]. These paths reference the original contigs or transcripts and their positions that contribute to the supertranscript.
    
ADD REPLY

Login before adding your answer.

Traffic: 3856 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6