Question about Trinity assembly QC?
1
0
Entering edit mode
17 months ago
pearl2070 ▴ 10

I have a question about some of the Trinity QC information. I'll use a tutorial dataset (found here:https://github.com/trinityrnaseq/KrumlovTrinityWorkshopJan2018/wiki/Home/1a23eb56a8857c3ed9595f9224367e25129f8f4b) for an example to help keep the question somewhat straightforward.

When TrinityStats.pl is run on the tutorial dataset, the result is 683 'genes' and 687 transcripts. Then, in the tutorial, under "Assess number of full-length coding transcripts," following BLAST-ing of transcripts and running analyze_blastPlus_topHit_coverage.pl on them, there is a chart generated of bins of percent length coverage of the best matching protein sequence, counts of proteins found in each bin, and a running total of proteins in all bins. It seems there's only 324 proteins in total. What happened to the rest/why is there a discrepancy between the number of proteins that have BLAST hits and the number of genes in the assembly?

QC Trinity RNA-seq transcriptomics metatranscriptomics • 587 views
ADD COMMENT
2
Entering edit mode
17 months ago
h.mon 35k

Because not all genes / transcripts will have blast hits, and default blast -outfmt 6 settings will omit sequences without hits. These should be tabulated in a 0% coverage line, which is not shown because blast doesn't output those sequences.

ADD COMMENT

Login before adding your answer.

Traffic: 1897 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6