I am very new to RNA-seq data analysis. Your help on how to accomplish the below task is highly appreciated! I will explain my problem as simple as I can.
1) Based on FeatureCounts output, I have computed the TPM score for my set of genes of interest. Here, I defined feature as "exon".
2) Next, based on TPM scores, I classified my genes into TPM bins (0, 0.1, 1, 10, 100, 1000, >1000). I got the empirical number of genes per each of these bins.
3) Independent of this, I have a Bowtie output in bam format for a chip-seq dataset. I used FeatureCounts once again to assign the reads in this bam file to exons+introns of each gene (I ran feature counts twice, where each time the feature was defined as either "exon" or "intron").
4) Now, I want to normalize these raw exon counts and raw intron counts per gene from the chIP seq dataset in such a way that will allow me to see to which gene bins (from step2) the chip data falls into. The idea is to see how my chip enrichment regions overlap with the genes of various TPM scores.
Can I calculate a "TPM score" for the chip-seq dataset and use this as a proxy for comparing it to the actual TPM scores of the genes? this is probably not the right way.
Thanks a lot in advance for you advice!