I have a large tab separated matrix containing FPKM values (expression) of known and novel genes and transcripts. The code first needs to calculate overall FPKM for a gene and then divide each isoform FPKM by overall gene FPKM. For example below MSTRG.1 gene contains three transcripts (AT1G01010.1, MSTRG.1.2, MSTRG.1.3) and transcript FPKM values in the corresponding columns:
gene_id trans Sample1 Sample2 MSTRG.1 AT1G01010.1 3.217145 5.362317 MSTRG.1 MSTRG.1.2 0 0 MSTRG.1 MSTRG.1.3 0 1.265547 AT3G04280 AT3G06460.1 0 4.852563 AT3G04280 MSTRG.12548.1 0.099178 0.480905 AT3G04280 AT3G06470.1 4.548129 6.963614
So the overall gene expression for sample1 for gene MSTRG.1 is 3.217145 and for AT3G04280 is 4.647307, similarly, the gene expression for sample2 for gene MSTRG.1 is 6.627864 and for AT3G04280 is 12.297082, when we divide the transcript expression by gene expression, the output matrix will be something like this:
gene_id trans Sample1 Sample2 MSTRG.1 AT1G01010.1 1 0.809056582935317 MSTRG.1 MSTRG.1.2 0 0 MSTRG.1 MSTRG.1.3 0 0.190943417064683 AT3G04280 AT3G06460.1 0 0.3946 AT3G04280 MSTRG.12548.1 0.02134 0.039 AT3G04280 AT3G06470.1 0.9786 0.566
Any help will be highly appreciated.