Is there a way to choose only exon annotations from an annotated peak .txt file to generate a metagene analysis?
A lab member recently ran a ChIP-Seq on Pol-II and gave me the files from sequencing to annotate. I generated a annotated peak file in the following format:

PeakID                        Chr   Start     End       Strand  Peak Score  Focus Ratio/Region Size  Annotation             Detailed Annotation       Distance to TSS  Nearest PromoterID  Entrez ID  Nearest Unigene  Nearest Refseq  Nearest Ensembl  Gene Name  Gene Alias                                                                      Gene Description                                    Gene Type
Pol-II-Chip.MACS2_peak_77166  chr6  74229770  74231637  +       7081        NA                       promoter-TSS (EEF1A1)  promoter-TSS (NM_001402)  52               EEF1A1              1915       Hs.745122        XM_005248666    ENSG00000156508  EEF1A1     CCS-3|CCS3|EE1A1|EEF-1|EEF1A|EF-Tu|EF1A|GRAF-1EF|HNGC:16303|LENG7|PTI1|eEF1A-1  eukaryotic translation elongation factor 1 alpha 1  protein-coding

However he wants to generate a metagene graph of only the exon regions. I attempted to simply sort and remove all the rows that were not exons, but my python script was unable to detect peakfile formatted lines when I did this. Is there a way to choose only exon annotations from a txt file like this? I've realized any sort of sorting done in excel will throw my script off and not allow it to read peak file formatted lines.

If this isn't possible, how could I get only exon annotations from a .narrowPeak file?

Could you clarify a little, for example by stating the hypothesis to test, such as "Pol II will be enriched at the start of each exon" vs "Pol-II will be enriched in exons vs introns", etc. I think I could help a bit if you made your goals clearer.


