I am mapping an RNAseq dataset (illumina 100bp PE) to a robust genome, but have only ~10 million per sample to cover 217,000 exons. This was by design as I'm primarily interested in differential expression, but my default mapping stage is with Tophat and a full gtf annotation file, and this salami-slices the abundance counts down quite significantly (a large proportion of exons finish with 0 reads).
My first thought was to investigate the genes as single units, ignoring splice variation. I thought to grep out 'gene' units from a gff file, convert to a gtf and provide to tophat, but it appears that gffread from cufflinks will only convert exons and CDS (unless I'm missing a flag). Also, I'm aware that Tophat is specifically based around splice variants and whether the fudge would break it. Alternatively, I imagine using the exon-mapped reads and combining the read counts per-gene would be more appropriate. Is there a common tool for the process that anyone can point at?
Otherwise, would a different mapper than Tophat be better (one not focussed on splice variants).