RepeatMasker error when trying to generate repeat sequence distribution pie chart (all code and errors provided)
0
0
Entering edit mode
8 months ago

Hi,

Any answers would be highly appreciated as I've been stuck on this for a while. I analyze epigenetic data and process DMR lists and I'm currently having issues producing a pie chart that shows the distribution of the repeat sequences in my samples. I've already done this for genomic distribution (TSS, 5' UTR etc.) but I can't seem to be able to figure out how I can use RepeatMasker to do the same thing.

Here's exactly what I did for the genomic distribution pie chart:

peakAnno <- annotatePeak("~/Desktop/1-m1Vsm2.bed.gz", tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="org.Mm.eg.db")
plotAnnoPie(peakAnno)

enter image description here

The results looked fine but then I tried the following using the UCSC RepeatMasker package in R (mm39 mouse sample). I've read the documentation and everything and still haven't figure it out. I got the following error:

#Annotate peaks (with repeat sequences) & plot
ah <- AnnotationHub()query(ah, c("RepeatMasker", "Mus musculus"))
rmskmm39 <- ah[["AH99013”]]
peakAnnoRepeat <- annotatePeak("~/Desktop/1-m1Vsm2.bed.gz", tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="rmskmm39")
plotAnnoPie(peakAnnoRepeat)
Warning messages:
1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :  
there is no package called ‘rmskmm39’
2: In getGeneAnno(annoDb, peak.gr$geneId, type, columns) :  
ID type not matched, gene annotation will not be added..

I thought extracting the mm39 masked repeat track would act as the annotation track instead of "org.Mm.eg.db" as was used in the original working example. If anyone has any pointers or solutions, that'd be great.

Thank you

repeatmasker • 2.1k views
ADD COMMENT
1
Entering edit mode

Wouldn't you need to make a custom TxDb object? I haven't done it myself, but I understood the annoDb to basically map gene annotations to genes given by txdb. Meaning the "peaks" are annotated using txdb info, then the additional gene info (SYMBOL, description etc) comes from annoDb.

ADD REPLY
0
Entering edit mode

I thought that's what I was doing by:

#Annotate peaks (with repeat sequences) & plot
ah <- AnnotationHub()query(ah, c("RepeatMasker", "Mus musculus"))
rmskmm39 <- ah[["AH99013”]]

It was my understanding that the rmskmm39 variable was now the repeat annotation track, I can't find a package that includes the repeat sequences like the non-repeat sequences anno track

ADD REPLY
0
Entering edit mode

What is in your txdb object?

ADD REPLY
0
Entering edit mode

Initially, when it worked:

txdb <- TxDb.Mmusculus.UCSC.mm39.knownGene

When trying to get the repeated sequences distribution (rmskmm39 variable can be found in original post):

txdb <- rmskmm39
ADD REPLY
0
Entering edit mode

When I tried this, chipseeker ignores the annoDb since it was a "custom" txdb. However it also wouldn't run plotAnnoPie on the resulting chipseeker object, but the regions were annotated with the info from the rmsk table and it would be straightforward to make a pie chart yourself from the columns.

ADD REPLY
0
Entering edit mode

Do you mind telling me how you were able to create a custom txdb object that sufficiently annotates repeats?

ADD REPLY
0
Entering edit mode

I believe I essentially followed your workflow with rmskmm39 as the txdb object.

But, it doesn't work with plotAnnoPie, so you can convert the chipseeker object to a dataframe

peakAnnoRepeat.gr <- as.GRanges(peakAnnoRepeat)
peakAnnoRepeat.df <- as.data.frame(peakAnnoRepeat.gr)

Also, you may need to exclude your annoDb argument, since it will be ignored and the "rmskmm39" package isn't valid which may cause the function to fail.

You should then be able to explore the table and manually count up the categories you are interested in.

P.S. I realize now I used the mm10 version of rmsk since that's what I used for my peaks. This probably shouldn't matter though.

ADD REPLY
0
Entering edit mode

Please do not paste screenshots of plain text content, it is counterproductive. You can copy paste the content directly here (using the code formatting option shown below), or use a GitHub Gist if the content volume exceeds allowed length here.

code_formatting

ADD REPLY
0
Entering edit mode

Fixed it, do you happen to have an idea for how to resolve my issue?

ADD REPLY
1
Entering edit mode

Sorry, I'm not equipped to help you with that - I'm not familiar with repeatmasker. I'll clean up these comments and bump the post, that way others will be able to help you.

ADD REPLY

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6