Homology of 3' UTRs
1
I am looking to identify the mouse homologs of human UTRs and measure the sequence conservation between the two.
Ensembl lists the homolgous protein relationships, but no the transcripts.
The best I can think of at the moment is:
Identify protein homology pairs.
Identify the mouse and human transcripts encoding each of the pairs.
Extract the UTR sequences
Do pairwise alignment to assess the match.
Can anyone think of a better idea? This sounds like a lot of work to do genomewide.
UTR
homology
• 1.0k views
Wouldn't it be easiest to simply pull a list of homolog transcripts between mouse and human from Ensembl?
library(biomaRt)
Mouse2Human <- function(MouseTx){
human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
txMouse2Human = getLDS(attributes = c("ensembl_transcript_id"),
filters = "ensembl_transcript_id",
values = MouseTx ,
mart = mouse,
attributesL = c("ensembl_transcript_id"),
martL = human,
uniqueRows = TRUE)
colnames(txMouse2Human) <- c("Mouse_Tx", "Human_Tx")
return(txMouse2Human)
}
## Manually collect mouse and human genes from Ensembl
musmusculus_tx <- getBM(attributes = c("ensembl_transcript_id"),
mart = useMart("ensembl", dataset = "mmusculus_gene_ensembl"))
Mouse2HumanTable <- Mouse2Human(MouseTx = musmusculus_tx$ensembl_transcript_id)
This should get you:
> head(Mouse2HumanTable)
Mouse_Tx Human_Tx
1 ENSMUST00000082405 ENST00000361739
2 ENSMUST00000110020 ENST00000555699
3 ENSMUST00000110020 ENST00000334869
4 ENSMUST00000110020 ENST00000555169
5 ENSMUST00000110020 ENST00000557434
6 ENSMUST00000110020 ENST00000393218
and from this you could then filter the UTRs of the respective transcripts out of the GTF files. Hope I got you right.
Login before adding your answer.
Traffic: 2956 users visited in the last hour
Complete list of human/mouse homologs is available from MGI.
This is the protein homologs, not the transcript homologs? So to get the homology of the UTRs, i'd need to do what I outlined above.
There are nucleotide and protein RefSeq ID's for mouse/human for each gene (they are likely not every transcript isoform that is out there but a good start).
Stupid me! I was only look at one line at a time, and only seeing refseq nucleotide ids for one species!