I have a de novo plant transcriptome assembly with the following stats from a company. Can it be used to evaluate rnaseq differential expression data? I have csv files with expression data and I am wondering if I should start looking at this data or if I need to improve the transcriptome assembly myself. There is no genome.
contigs: 1499698
smallest contig: 201
largest contig: 13777
n_bases: 787989217
mean_len: 525.43193
n_under_200: 0
n_over_1K: 177786
n_over_10k: 25
n_with_orf: 246891
mean_orf_percent: 65.70842
n90: 236
n70: 354
n50: 738
n30: 1502
n10: 2862
gc: 0.44474
bases_n: 0
proportion_n: 0.0
score: NA
optimal_score: NA
cut_off: NA
weighted: NA
In addition to what @ponganta has said:
FastQC
on your raw data, for example?TransRate
report?TransRate
throw you an error when you ran it?For a cursory glance, it seems like a slightly underwhelming assembly. That's a lot more assembled contigs than I've encountered in most cases, and the same goes for the alleged protein coding sequences. I have the impression that what's been assembled is quite fragmented.