Question

How to group transcripts by "gene" from different transcript assemblies?

0

Entering edit mode

2.7 years ago

O.rka ▴ 710

What is the recommended way to group transcripts by "gene"? For example, rnaSPAdes returns a predicted gene identifier with the transcript in the identifier.

Is there a way to merge similar transcripts from different assemblies?

I'm sure this uses the graph files in the backend so what I'm asking may be really out of scope. I could use something like CD-HIT but I'm wondering if there was a better way. Maybe a way to use the de brujn graphs together?

assembly trinity rnaspades transcript • 1.2k views

ADD COMMENT • link updated 2.7 years ago by ponganta ▴ 590 • written 2.7 years ago by O.rka ▴ 710

score 0 · Answer 1 · 2021-07-28

0

Entering edit mode

2.7 years ago

sahilbioinf0 • 0

Transcripts were clustered using CD-HIT (Cluster Database at High Identity with Tolerance) package. Used to remove the shorter redundant transcripts when they were 100% covered by other transcripts with more than 90% identity. The non-redundant clustered transcripts were then designated as unigenes.

ADD COMMENT • link 2.7 years ago by sahilbioinf0 • 0

1

Entering edit mode

Is that a quote?

ADD REPLY • link 2.7 years ago by ponganta ▴ 590

score 0 · Answer 2 · 2021-07-28

0

Entering edit mode

2.7 years ago

ponganta ▴ 590

If you want a common "baseline" for several assemblies, the only way I could think of would be annotation with a common database. For instance, if your assemblies come from closely related species, you could annotate CDSs with a common reference (e.g. a closely related model species).

For individual assemblies, if you would like to go from transcript-level to gene-level (which has advantages), you could also cluster transcripts using Corset, or Grouper based on shared read support.

Another way forward would be to combine both techniques. For each assembly, you could cluster assembled transcripts using one of the two previously mentioned programs. You could then construct SuperTranscripts using Lace. You could then try and annotate supertranscripts, and compare (likely) homologous genes with one another. Hope that helps.

ADD COMMENT • link 2.7 years ago by ponganta ▴ 590

0

Entering edit mode

A suggestion. If you could provide links for the programs mentioned your answer would become more complete. Programs can have similar names and searching with the names above is likely to lead to not-useful-for-science results.

ADD REPLY • link 2.7 years ago by GenoMax 141k

0

Entering edit mode

Thanks for the suggestion, but I included links in my answer. They are but a click on the name away :) This is sadly hard to see on biostars when you highlight the names of programs, while also linking to a repository...

ADD REPLY • link 2.7 years ago by ponganta ▴ 590

0

Entering edit mode

Agreed. Hard to see unless you hover on name. Another suggestion. You can either add a separate (LINK) after the name to make the link clear or not use code tags for program names but simply include links.

ADD REPLY • link 2.7 years ago by GenoMax 141k