Should missing values in one condition but highly significant in other stay or go away? [NGS]

1

Entering edit mode

8.4 years ago

Sukhi Singh 11k

I am sure, its discussed here and there and in some posts and comments, but I am writing up as a new question.

What is consensus, when it comes to the missing data points (absent gene values) in one condition but highly significant in other.

Consider the following MA plot. I have setup several thresholds and in accordance to those, I label up the points with different colors. Now, if you look at the plot, the two diagonal lines (black-green and black-orange) which protrude out in opposite dimensions (going up and down, 45 deg) are the points which are quite significant in one condition but are missing the values in other.

These values are coming from a ChIP-Seq data and we also know that the missing data doesn't necessarily mean that the information from that gene is completely absent (biologically) but could be arising from experimental or computational issues.

So, simply put should these points (genes) stay with an explanation or must go away?

enter image description here

ChIP-Seq RNA-Seq R next-gen statistics • 1.4k views

ADD COMMENT • link 8.4 years ago by Sukhi Singh 11k

1

Entering edit mode

Until proved otherwise (independently and experimentally) they should stay since this is what you have in the data.

ADD REPLY • link 8.4 years ago by GenoMax 144k

0

Entering edit mode

Right, still I have seen some people removing these and mentioning it in the text.

ADD REPLY • link 8.4 years ago by Sukhi Singh 11k

0

Entering edit mode

As long as you document what was taken out (not sure if there can be a valid "why") at least you will make the analysis reproducible for others.

ADD REPLY • link 8.4 years ago by GenoMax 144k

Login before adding your answer.