Question: comparing an assembly against close relatives
gravatar for Assa Yeroslaviz
20 days ago by
Assa Yeroslaviz1.2k
Assa Yeroslaviz1.2k wrote:

I would like to ask for recommendations/workflows on how to asses the results of our de-novo assembly from a trinity run.

We have done a Trinity run and got a transcripts.fasta files with the assembled transcripts.

Now we would like to compare these transcripts to an annotated genome of a close relative (which has a gtf file) to try and answer two questions.

  1. To asses the quality of our trinity output for completeness and correctness.
  2. To try and identify functionality of our newly assembled transcript via sequence homology.

I would appreciate some suggestions as to how to do both of these assignments.



trinity blast assembly de-novo • 210 views
ADD COMMENTlink modified 10 days ago by colindaven1.2k • written 20 days ago by Assa Yeroslaviz1.2k

If you have a close relative reference genome, why do you consider necesary to do a de novo assembly? which is the purpose? obtain the transcriptome? or evaluate the assembler performance?

ADD REPLYlink written 20 days ago by Buffo1.5k

the first one. I would like to obtain the annotated transcriptome of the said organism.

Would BLASTing my transcripts from Trinity (Trinity.fasta) against the genome (genome.fa file) of the close relative would give me good enough results?

ADD REPLYlink modified 20 days ago • written 20 days ago by Assa Yeroslaviz1.2k

If you need the transcriptome and the reference genome is closely-related and well annotated, you can just quantify the transcript expression using StrinTie and also look for new isoforms more than compare assembled transcripts.

ADD REPLYlink written 20 days ago by Buffo1.5k

I have already done two things.

  1. mapping the fastq files against the close relative (this gave me good results)
  2. run a de-novo assembly using Trinity to create a transcript.fasta file with the assembled transcripts. this was followed by running bowtie of the fastq files against the indexed transcript.fasta file.

From both runs i got a bam file, which is needed by StringTie If I run StringTie against the first bam file, I won't have the annotations I need, but running StringTie with the second bam file - I don't have an annotation file (gtf), so I am not sure if this make sense. Can I use as input the bam file from the de-novo assembly, but takes the annotation file from the close relative?

ADD REPLYlink modified 20 days ago • written 20 days ago by Assa Yeroslaviz1.2k

Run Stringtie with the bam of step 1 and the GENOME annotation (GTF, GFF or GFF3), you do not need an extra annotation. Bam file contain the coordinates where each read is aligned to the genome, stringtie just count how many reads align to each annotated gene and calculates relative expression (TPM and FPKM). You can also count the mapped reads (to each annotated gene of bam1 ) with htseq-count and then calculate relative expression.

ADD REPLYlink written 20 days ago by Buffo1.5k

But I'm not primarily interested in counting the expression. This is only a secondary results of the analysis.

I am more interested in creating an annotation file for my genome, which has no annotations. I would like to try comparative genomics which can assign/predict functionality to my transcripts( I was thinking something like via BLAST or other ORF-reading comparison tools).

ADD REPLYlink written 20 days ago by Assa Yeroslaviz1.2k

you can also use stringtie or cufflinks to annotate your transcripts to the genome. To assign potential functionality to your transcripts you can use blast, but I recommend you Blast2GO, it is more easy to handle and can perform exactly you are looking for.

ADD REPLYlink written 19 days ago by Buffo1.5k
gravatar for colindaven
10 days ago by
Hannover Medical School
colindaven1.2k wrote:

What I used to do when I did a lot of this type of transcriptome assembly was the following

  • map trinity results (many different parameters, or clustered, or various organs) against the genome with gmap, using the very nice GFF3 out out option.
  • Manually compare (or get biologists to compare transcripts and regions of interest, even better) the different assemblies. You can get an impression very fast of which sets of results look best.
  • Use transdecoder to get sets of CDS, amino acids etc from the trinity assembly.

Worked pretty well. Functionally annotating the FASTA outputs of transdecoder was always highly compute intensive ....

You might also (re)annotate the genome using Maker with the evidence from the Trinity assemblies and Transdecoder steps.

Also, providing results iteratively to your collaborators via eg a local JBrowse will allow you to improve the transcripts and provide versioning.

ADD COMMENTlink modified 10 days ago • written 10 days ago by colindaven1.2k

Thanks for the suggestion. I was already planning on running either StringTie or gmap. But just for clarifications - do you mean using the results of the trinity run (e.g. Trinity.fasta) to map against the indexed genome of the close relative?

something like that:

gmap_build -d Genome -D ./indexedFolder -k 13 closeRelative.fa
gmap -n 0 -D ./indexedFolder -d Genome ../trinity/trinity_output/Trinity.fasta -f gff3_gene > trinity_gmap.gff

How can I than make sense out of the gff file? Did you use transdecoder on the gff file as well?

ADD REPLYlink modified 9 days ago • written 9 days ago by Assa Yeroslaviz1.2k

From memory that looks reasonable. You might play around with the -n parameter to exclude junk too.

Making sense out if requires your eyeballing after import into a genome browser. That's why I mentioned JBrowse, which is excellent for comparing multiple tracks. You can import the GFF3 and use the server or standalone version.

Of course, you'll also need to import the GTF of your close relative too for comparison.

Hopefully that will allow you to see if your assembly is overly fragmented or reasonable.

ADD REPLYlink written 9 days ago by colindaven1.2k

thanks, I'll try it and see where it goes.

ADD REPLYlink written 7 days ago by Assa Yeroslaviz1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1801 users visited in the last hour