I am currently doing my masters project about cardiac diseases.
So we took 14 patients and did rna seq and we got our results as FPKM. I will be using IPA software to create networks and pathways etc..
so my questions are :
What values of fpkm are considered to be significant ?
We have no control , so we will be taking patient 13 as the control beacause he has the least severe phenotype so how can I compare all patients fpkm to patient 13 ?(to know what genes are up/down regulated) I did a ratio between every fpkm for every gene is it correct ?
What cut-off should i consider ?
Results are in excel form as either expression profile g and expression profile G so whats the difference and which one should I take ?
FPKM stands for fragments per kilobase of exon per million fragments mapped,i.e, FPKM is a expression unit which reports a probabilistic estimation of isoform abundance in RNA-Seq data. To measure the "significance", I'd attend to FDR or evalue parameters instead of FPKM.
For other hand, in order to classify the genes as UP/DOWN regulated you need to perform a comparison. In your case, you should compare your "cases" against the control, patient 13. To group the genes as UP/DOWN, you need to calculate the Fold Change, with the ratio of FPKM values of the case and the control. If the value is negative, means that the given gene is down regulated in the case in comparison with the control, and the opposite for a UP regulated gene. I usually give special attention to those genes having a FC <= -2 or FC >= 2, but it depends on you, your data and your goals.
First of all am not sure if this experimental approach is workable or not. It seems to me a bit flawed as you will be having one control against the rest. If it was time point RNA-Seq then also it would have made some sense. There are caveats in the approach. You can compare patient 13 with rest but then the statistical power will not be robust enough to give you any significant up/down genes. Any of the tools like limma,edgeR, DESeq2 will not yield fruitful result when you compare 1 against the other. Most of the tools which you raw count based data rely on comparing groups of one against the other and even it is not paired there should be minimum of 2-3 replicates per group to give it a statistical significance. You might average out rest patents fpkm per gene across all samples to have one fpkm and that can be then tried upon with patient 13, but there will be biases to that and also the result will not be that significant. But in any case to the Fold Change in this case will be calculated as patient 13 vs rest(avg. FPKM) and then associate a p-value to test the significance. But this entire will be flawed. Usually fpkm is normalized expression values which is used for visualization and comparison of gene expression downstream but for differential expression you must have raw counts. You can convert the fpkm to raw read counts as a crude approach , and then convert them to nearest integer and then try any DE tools mentioned above but I am sure it will not be that significant . So it is better get a bit details of the entire experimental approach. The approach for FC and FDR mentioned by @iarun is fine but it is really not a proper way to do the analysis. Infact @b.nota pointed it out correctly.
So here is what you should do,
Read RNA-Seq papers (methods, tools)
Read analysis of RNA-Seq (patient specific, experimental condition)
Read tools comparison papers
Read the normalization approaches used in RNA-Seq and sit with your supervisor to explain the caveats in the approach and then you might be able to design analysis protocol.