Hello all, I am looking for an excel wizard to help me with a messy task: I would like to take the average of values in one column for all those cells sharing an ID in another column BUT with the added stipulation that outliers are excluded based on the use of quartiles.
Here is an example of the spreadsheet:
A B C D ABCA5 213353_at 5 5 ABCB1 209993_at 4 4 ABHD5 213805_at 5 38.33333333 ABHD5 213935_at 10 38.33333333 ABHD5 218739_at 100 38.33333333 ACSM3 205942_s_at 1 1000.5 ACSM3 210377_at 2000 1000.5
As you can see Column A has a geneID, column B a probe name (which is irrelevant to this task) and C has the 'expression value, which I have given false values here to make the outliers problem a bit more visually extreme. In column D I have filled in the average values using the following excel code:
I placed that in the first cell of column D then dragged it down to fill in all the appropriate values. What I would like to do now is amend this excel function so that it performs this task but takes advantage of the QUARTILE function to only use values in C to calculate an average if:
- The ID in column A for that value matches
- It falls within an upper and lower bound defined by 1.5 times the interquartile range.
I'm just having trouble managing all the nested if() functions and conditionals to do this and thought maybe someone with greater excel experience could offer a solution. Many thanks for any help you can provide!