Question: STAR-HT-Seq/featureCount got much more gene expression count than RSEM did
0
gravatar for CY
4 weeks ago by
CY130
United States
CY130 wrote:

I noticed that the gene expression count from HT-Seq / featureCount followed by STAR is much more higher than what RSEM got. It seems RSEM assign more much alignments as non-primary.

I did some digging and got vague impression that this is because RSEM consider more than just counting the reads aligned to gene region like HT-Seq / featureCount does.

I never got a clear explanation what causes this difference. Can anyone share some comments. Really appreciate.

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by CY130
1

You should be observing that RSEM get somewhat higher counts than the other methods, since it doesn't ignore multimappers, whereas the other methods do. If you're seeing anything other than that then you did something wrong.

ADD REPLYlink written 4 weeks ago by Devon Ryan81k

I went back and checked the aligners' output. For some reasons, RSEM generates huge amount of non-primary alignment (almost 50%) while STAR only got a little non-primary alignment (<10%). This is the reason why ht-seq followed by RSEM got really low expression count.

I know that RSEM uses STAR internally as well. I can only imagine that the difference is because RSEM used different parameters during STAR alignment. Although I could not figure out which parameters contributed to the difference......

I used basically default parameter:

STAR --limitSjdbInsertNsj 1800000 --genomeDir ${STARGenomeFolder} --readFilesCommand zcat --readFilesIn ${File1} ${File2} --sjdbGTFfile ${GTF} --outFileNamePrefix ${name}_ --outSAMtype BAM SortedByCoordinate --outSAMstrandField intronMotifi --outReadsUnmapped Fastx

While RSEM uses:

STAR --genomeDir $STARgenomeDir --readFilesIn $read1 $read2 --outSAMunmapped Within --outFilterType BySJout --outSAMattributes NH HI AS NM MD --outFilterMultimapNmax 20 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --sjdbScore 1 --readFilesCommand zcat

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by CY130
1

I have no idea what you mean by "ht-seq followed by RSEM", RSEM should follow STAR and never be used in conjunction with htseq-count. RSEM should produce a lot of multimappers, it's using STAR (or another aligner) to align to the transcriptome.

ADD REPLYlink written 4 weeks ago by Devon Ryan81k
0
gravatar for CY
4 weeks ago by
CY130
United States
CY130 wrote:

I used htseq-count after RSEM just to compare STAR-htseq vs RSEM-htseq. Now I see non-trivial difference in htseq results. I think it is partially because RSEM generates large amount of multimappers which will not be counted by htseq.

Also, I think you are right about never using RSEM-htseq together.

ADD COMMENTlink written 4 weeks ago by CY130
1

htseq-count should never be used after RSEM, you should use the output of RSEM directly.

ADD REPLYlink written 4 weeks ago by Devon Ryan81k

I'm agreeing with Devon Ryan here but he could be using the bam output from RSEM and (re-)use that as a bam input file for htseq-count, no?

ADD REPLYlink written 4 weeks ago by lieven.sterck1.7k

That was exactly what I did. I know RSEM + HTSeq are not used together usually. I just did this once trying to make a comparison of the expression count (RSEM vs STAR)

Then I found out STAR + HTSeq got much high expression count than RSEM + HTSeq. Digging deeper I found RSEM generate much more non-primary alignments (are they multimappers mentioned by @Devon Ryan?) which are not counted by HTSeq.

Besides, I guess I should not expect so many non-primary alignments because I was looking at gene.bam, not isoform.bam

ADD REPLYlink written 4 weeks ago by CY130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1565 users visited in the last hour