What Situations Fit The Different Overlap Resolution Modes Of Htseq-Count/Summarizeoverlaps?
1
3
Entering edit mode
12.0 years ago

Assigning reads to features is an important part of RNA-Seq. Because Simon Anders work in this area has now been implemented in both Python and R, it might be time to get a better understanding of this chart.

Can someone describe situations in which someone might choose a certain overlap resolution mode (or even roll their own) and why?

enter image description here

rna-seq • 3.1k views
ADD COMMENT
5
Entering edit mode
12.0 years ago
Ryan Dale 5.0k

I suppose it depends on how correct you assume your gene models to be. I tend to assume the gene models I use are not completely correct, so I use union mode for these reasons:

  • Accepting some "slop" around the gene (row 2 in the table you posted) allows for things like mis-annotated TSSs
  • Accepting cases like row 3 allows detection of unannotated isoforms or unspliced transcripts
  • The same assumption that the gene models are not totally accurate means that in the second-to-last row, it's possible that gene_B extends further into the read, which would make it a truly ambiguous read

That said, if you are interested in detection of isoform-specific expression of annotated isoforms then intersection_strict would probably be needed over union, and maybeintersection_nonempty if you don't care about the last point in the list above.

If you suspect there may be substantial DNA contamination in your RNA-seq data, it's possible that cases like row 3 will erroneously assign DNA reads to the gene. The easy fix would to switch to intersection_strict mode. If you wanted to keep the rest of the union mode logic though, I think you'd have to roll your own mode that keeps track of which bases in the read overlap a gene.

ADD COMMENT

Login before adding your answer.

Traffic: 2266 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6