Hello everyone,
I am trying to use RNAseq data to analyze differential exon usage between two distantly related species (mouse vs chick). The problem I have encountered has to do with extracting orthologous exons between mouse and chick. When comparing the splicing profiles between these two species, I want to be sure that I am only comparing orthologous exons since the exon composition of, even, orthologous genes can differ.
I tried using the liftOver
tool, in R, through the package rtracklayer
to achieve this. I essentially used the galGal6 (chicken genome) BED and galGal6ToMm10.over.chain files, from UCSC, as the input for the liftOver function. The code I used is written below.
library(rtracklayer)
library(GenomicRanges)
library(readr)
gg6 <- read_delim("gg6.bed", "\t", escape_double = FALSE, col_names = FALSE, trim_ws = TRUE)
colnames(gg6) [1:12] <-c("chrom","start","end","name","score","strand","thickStart","thickEnd","itemRgb","blockCount","blockSizes","blockStarts")
grObject<-GRanges(as.data.frame(gg6))
galGal6ToMm10_over <- import.chain("galGal6ToMm10.over.chain")
results <- as.data.frame(liftOver(grObject, galGal6ToMm10_over))
I am having trouble interpreting the results
output. I can see that the gene ids are repeated many times. At first I thought that they correspond to the exons of that gene but after looking closer it seems like they are repeated hundreds of times. I understand that for every input element there might be multiple ranges mapped but I am not sure how to know exactly which exons are orthologous.
The results
output looks like
group group_name seqnames start end width strand name score thickStart thickEnd itemRgb blockCount blockSizes blockStarts
1 NA chr1 38170035 38170046 12 - XM_015277791.2 0 133973566 134293920 0 21 1.018222e+55 0,11476,11830,14835,17031,19160,21996,33945,36262,39713,42259,50915,80028,95797,126086,131334,137738,180905,277249,282605,330361,
Have you guys ever dealt with liftOver files? Any help would be greatly appreciated :)