Question

Interpreting liftOver output to extract orthologous exons between distantly related species.

0

Entering edit mode

5.1 years ago

fmerkal ▴ 60

Hello everyone,

I am trying to use RNAseq data to analyze differential exon usage between two distantly related species (mouse vs chick). The problem I have encountered has to do with extracting orthologous exons between mouse and chick. When comparing the splicing profiles between these two species, I want to be sure that I am only comparing orthologous exons since the exon composition of, even, orthologous genes can differ.

I tried using the liftOver tool, in R, through the package rtracklayer to achieve this. I essentially used the galGal6 (chicken genome) BED and galGal6ToMm10.over.chain files, from UCSC, as the input for the liftOver function. The code I used is written below.

library(rtracklayer)
library(GenomicRanges)
library(readr)
gg6 <- read_delim("gg6.bed", "\t", escape_double = FALSE, col_names = FALSE, trim_ws = TRUE)
colnames(gg6) [1:12] <-c("chrom","start","end","name","score","strand","thickStart","thickEnd","itemRgb","blockCount","blockSizes","blockStarts")
grObject<-GRanges(as.data.frame(gg6))
galGal6ToMm10_over <- import.chain("galGal6ToMm10.over.chain")
results <- as.data.frame(liftOver(grObject, galGal6ToMm10_over))

I am having trouble interpreting the results output. I can see that the gene ids are repeated many times. At first I thought that they correspond to the exons of that gene but after looking closer it seems like they are repeated hundreds of times. I understand that for every input element there might be multiple ranges mapped but I am not sure how to know exactly which exons are orthologous.

The results output looks like

    group   group_name   seqnames   start   end   width   strand   name   score   thickStart   thickEnd   itemRgb   blockCount   blockSizes   blockStarts
1   NA   chr1   38170035   38170046   12   -   XM_015277791.2   0   133973566   134293920   0   21   1.018222e+55 0,11476,11830,14835,17031,19160,21996,33945,36262,39713,42259,50915,80028,95797,126086,131334,137738,180905,277249,282605,330361,

Have you guys ever dealt with liftOver files? Any help would be greatly appreciated :)

rna-seq R genome • 964 views

ADD COMMENT • link 4.8 years ago by fmerkal ▴ 60