Interpreting Trinity components
Entering edit mode
7.4 years ago

I am trying to interpret the component graphs from my Trinity run. I rendered a couple of graphs of the components (using c*.graph.out files) from a Trinity assembly and noticed that some components had a structure where the root node (-1 node) is in the middle of a "linear" sequence of nodes.

I uploaded my ipython notebook here with 3 of the graphs I rendered:

The first component (c445) looks normal to me, with a root node (in red) that connects to one linear sequence and eventually splitting into two branches followed by a merge that could possibly indicate isoforms.

But the second and third component graphs showed the root node in the middle of a "linear" region. Furthermore, for the second component graph, the probable paths were the two "arms" of the root node.

There are no shared k-1-mers between the nodes on either side of the root node in the second and third component. How does the bundling of contigs work in this case? Is it putting these contigs together based on pair-end reads? And what exactly is the -1 root node?

Transcriptome Trinity • 2.2k views
Entering edit mode
7.4 years ago

I just got a response from the Trinity mailing list via Brian Hass:

The -1 is the root node for the de Bruijn graph. A way you can end up with multiple 'arms' in the graph like that is if there are multiple inchworm contigs that are clustered together based on paired-read links (from the bowtie alignment step). This way, they end up having the same 'component' number in the accession string (ie. (c\d+) of the c\d+_g\d+_i\d+ accession naming format. This often happens when transcripts for a given gene are fragmented.


Login before adding your answer.

Traffic: 2437 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6