Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse cellular processes. Determining the function of individual lncRNAs remains a challenge. Recent advances in RNA sequencing (RNA-Seq) and computational methods allow for an unprecedented analysis of such transcripts. Our catalogue unifies previously existing annotation sources with transcripts we assembled from RNA-Seq data across human 24 tissues and cell types.
We want to find that lncRNA expression is strikingly tissue specific compared to coding genes. I'm using JS divergence to evaluate the tissue specificity. Recently, I read a paper "Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses". However, I don't know how to calculate for python code as follows:
import os
from scipy.stats import entropy
from numpy.linalg import norm
import numpy as np
def JSD(P, Q):
_P = P / norm(P, ord=1)
_Q = Q / norm(Q, ord=1)
_M = 0.5 * (_P + _Q)
return 0.5 * (entropy(_P, _M) + entropy(_Q, _M))
Please reformat the question to make it readable. It looks like the non-code is formatted as code and vice versa.
How to best go about it will depend a bit on the exact input you have. Here's what we're doing for JSD calculation in deepTools. Our requirements are a bit odd, since we have a very spiky distribution, which is why all of the interpolation stuff is done.