how can I find the most frequent isoform in each cell line. for example I have RNA-seq data of HeLa cells and want to get only one isoform(transcript) per gene but the one which is specific to HeLa cells for example.
I dont know if there is a database for that but what I would do is:
Use publicly available data sets:
Take hela cell RNA-Seq data and quantify the transcripts. A simple library size normalisation would be enough.
Take RNA-Seq data from few other tissues and do the same. ( There are many data sets available )
Calculate the fold changes for the transcripts ( hela cell vs other cell types ) and plot the distribution.
Keep a cutoff based on distribution. Lets say a transcript has 3 or more times expression in hela cells than other tissues. This will be hela cell specific transcripts. Then get the most abundant transcript for each gene.
You will end up with tissue specific most abundant transcripts.
P.S This seems to be a lot of work but its fun to do it.