24 months ago by
Republic of Ireland
For each variable, It is important to understand to what NA actually relates. Does it mean that the variable was below the detection limit?; the patient never showed up for the test?; the test failed QC?
Some strategies that have been used for different types of NAs in continuous data:
- impute them as 0
- replace them with half the lowest value
- replace with the median (if univariate testing)
You can also model the data and impute the values as model predictions.
In reality, you may not have much choice but to eliminate the samples with NAs. Looking at your data, for example, what can you realistically do with those samples that are all NA across your variables? That is a situation where, perhaps, you should bite the bullet and accept that the data is too poor to use, i.e., as opposed to trying to use it.
If you are using these in regression against your WGCNA modules, for example, then an error will be thrown. If you just correlate them, then the correlation test will usually delete the samples with NA automatically - this is controlled via the
use argument passed to
an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs".