Hi, I am analyzing RNA-seq data for alternative splicing analysis and used for this reason RefSeq human genome as reference (hg38.p14). I have a list of alternatively spliced exons significant for my conditions obtained using DEXSeq.
I want to select only exons that are located in the CDS and not in the UTR regions to obtain a list of events that could be related to some protein conformational change. To do so I tried to use methods that are typically used for chIP-seq analysis.
I used annotatePeaks.pl from homer software, however both when using the default annotation and when giving a custom GTF file the result table contains only NAs in the columns where the annotations should be.
I ran annotatePeaks as:
annotatePeaks.pl exon_intervals.txt hg38 -gtf hg38.ncbiRefSeq.gtf > annotated_exon_intervals.txt
And this is the head of the resulting table:
PeakID (cmd=annotatePeaks.pl ./data/caco2_rnaseq/DEXSeq_refseq/exon_intervals.bed hg38 -gtf ./rawdata/genomes/human/refseq/hg38.ncbiRefSeq.gtf) Chr Start End Strand Peak Score Focus Ratio/Region Size Annotation Detailed Annotation Distance to TSS Nearest PromoterID Entrez ID Nearest Unigene Nearest Refseq Nearest Ensembl Gene Name Gene Alias Gene Description Gene Type
F10:004 13 113139357 113139470 + + NA NA NA NA NA
LOC105373422:003 02 9950705 9950853 + - NA NA NA NA NA
CEP72:021 05 656928 657088 + + NA NA NA NA NA
HTATSF1:009 23 136509091 136509180 + + NA NA NA NA NA
ZNF106:018 15 42446589 42446658 + - NA NA NA NA NA
...
I tried to use as input both the bed format and the homer input format but the result does not change. I am not using the proper type of data which could affect the anlaysis. I am open to solution but also new ways of annotation exons to CDS/UTR.
Thank you,
Alessia