If you look up a protein, say IL8, on uniprot, you find you can download the gff file for that protein.
I would like to display the protein features therein on a track, so one can compare canonical sequence and protein information together with sample specific seen isoforms:
library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(ggbio) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene p1.IL8 <- ggplot(txdb) + geom_alignment(which = genesymbol["IL8"],gap.arrow="FALSE",gap.geom = "chevron") param = ScanBamParam(tag="XS",which=genesymbol["IL8"],what=c("seq")) ga <- readGAlignmentsFromBam(bamfile,use.names=T,param=param) reads <- grglist(ga) metadata(reads)$param <- param IL8.isoform.1 <- reads[c(1)] p1 <- autoplot(IL8.isoform.1,fill = "brown",color="brown") tracks( IL8 = p1.IL8, "Iso 1" = p1)
The above code produces an image of narrow rectangles for the UTR regions, larger rectangles for the CDS and chevron for the intronic regions. Uniprot provides the relative positions for the protein features along the sequence in gff file. Theoretically, one should be able to read it and display these features.
I want to do this at least for the canonical protein, but ideally I would also like to show the differences with sample specific isoforms. Does anyone know the standard way to do this?