I am currently analysing RNA-seq data, and I realized in the GTF file (Ensembl v.96) which I use in mapping, there are ~19000 clone based (Ensembl) genes.
Some of them share the exons with their 'parent' protein coding genes in terms of genomic locations.
I am considering removing these clone based genes as they will be affecting the statistics of genome alignment (especially the multi-mappers), and transcript quantification which I plan to do later on.
The total number of genes in the Ensembl hg38.p12 GTF file is ~58000. So if I remove these clone based ones, I am left with ~37000 genes.
Would it be a good call to remove these clone based genes from GTF file? Or would it lower the power of the analyses?
- where non-coding Z83844.1 (Clone-based (Ensembl) gene) exons overlap with NOL12:
- where coding AC008403.1 exons overlap with CYTH2 gene :
I would appreciate your suggestions. Thank you in advance.