Does anyone know where I can find a list of Ensembl transcript IDs (e.g., ENST0000011111) associated with protein-coding genes?
I've run my mRNA-Seq data through kallisto and am plotting PCA on the pseudocounts, which are associated with Ensembl Transcript IDs. A labmate wonders if my PCA would cluster better if I only used the rows of counts corresponding to Ensembl Transcript IDs associated with mRNA (since, presumably, all of the other rows would be 0, since the sequencing was only done on mRNA, not ncRNA).
I'm not sure to what extent removing rows of 0s would change my PCA, but I thought I'd give it a try.
Thanks for your help!