I have some confused results regarding GC content in transcriptome, I have made a transcriptome de novo assembly by Trinity, and exploited some prebuild scripts (i.e TrinityStats) in Trinity package to calculate basic statistics of assembled transcriptome, it returned GC% = 46.64 %.
Meanwhile, I tried to use Prinseq to get some summaries (although it is designed to use with reads), and get the result like: GC Content Distribution Mean GC content: 46.04 ± 6.75 %
Minimum GC content: 19 %
Maximum GC content: 74 %
GC content range: 56 %
Mode GC content: 45 % with 4,275 sequences
On the other hands, when I count number of G and C base in multifasta transcriptome (using simple 'grep' while omitting every fasta header), then calculate GC content by fomula (G+C)/(G+C+A+T), I got 49.96%.
Could anyone help me clarify these differences and suggest which GC content result is accurate? Thank you very much.