I have a file consisting 3 columns (patient ID, subtype, rpkm - all of this for one gene), sorted by the second -
sample subtype rpkm
patient1 LumA 0.1253201
patient2 LumB 3.00531
I want to plot a box plot for this gene to check the variation for different subtypes. Could someone please guide me through it from the beginning, i.e. calculating the average and standard deviation across each sample for a particular subtype?