Hello,
I have asked this question on bioconductor, but I didn't get any response there. I hope I can get some help here.
I am trying to annotate peaks derived with MACS2 and run into some errors. I hope someone can help me to get rid of these errors. I use ChIPpeakAnno and have ensembl data, but I get errors about the style. Here my R code:
> library(ChIPpeakAnno)
> library(GenomicFeatures)
>
> bed <- read.table("new.txt",sep = "\t" )
> colnames(bed)<-c("chr", "start", "end", "x")
> gr1 <- toGRanges(bed, format="BED", header=T)
duplicated or NA names found. Rename all the names by numbers.
> gr1
GRanges object with 69388 ranges and 1 metadata column:
seqnames ranges strand | x
<Rle> <IRanges> <Rle> | <factor>
X00001 1 [ 39845630, 39845896] * | All_peak_1450
X00002 1 [191889009, 191889559] * | All_peak_5137
X00003 1 [212860802, 212861026] * | All_peak_5820
X00004 1 [ 36306817, 36307312] * | All_peak_1337
X00005 1 [ 44433306, 44433488] * | All_peak_1630
... ... ... ... . ...
X69384 Y [56847145, 56847269] * | All_peak_69384
X69385 Y [56850752, 56850897] * | All_peak_69385
X69386 Y [56858855, 56858947] * | All_peak_69386
X69387 Y [56859031, 56859116] * | All_peak_69387
X69388 Y [56861359, 56861446] * | All_peak_69388
-------
seqinfo: 127 sequences from an unspecified genome; no seqlengths
> library(EnsDb.Hsapiens.v79)
> annoData <- toGRanges(EnsDb.Hsapiens.v79)
> seqlevelsStyle(annoData) <- "Ensembl"
> seqlevelsStyle(gr1) <- seqlevelsStyle(annoData)
Error in .replace_seqlevels_style(x_seqlevels, value) :
found no sequence renaming map compatible with seqname style "NCBI" for this object
In addition: Warning message:
In `seqlevelsStyle<-`(`*tmp*`, value = c("NCBI", "Ensembl")) :
more than one seqlevels style supplied, using the 1st one only
> annoPeaks(gr1, annoData)
Error in seqlevelsStyle(seqlevels(x)) :
The style does not have a compatible entry for the species supported by
Seqname. Please see genomeStyles() for supported species/style
I hope that someone knows what's going on, I think that the style is not correct defined, but don't know how to correct this.
Thanks in advance.
Best, Ben
Thanks for your comment.
Yes the MACS2 file output for chromosome is in Ensembl format (which is the same as NCBI also), you can see that when I types in
gr1
.For the second point, I did understand that when I say that the style is "Ensembl" it also matches "NCBI". But how do I do this correctly?
Please check all of them. You have 127 sequences in MACS output
seqinfo: 127 sequences from an unspecified genome; no seqlengths
For the second point, just use any one of them, instead of forcing
seqlevelsStyle(gr1) <- seqlevelsStyle(annoData)
, you can try seqlevelsStyle(gr1) = "Ensembl" explicitly.Thanks again.
The 127 sequences are the extra chromosomes in the genome (hg38). In the annoData object I created there are 319 seqnames. I don't think is this the problem.
You can see that annoData contains these other chromosomes as well (such as KI270752.1, etc.).
Besides, when I only focus on the normal chromosomes, I still get the same error:
I think something goes wrong with loading the data? But what and how do I do this correct?
Try argument
"stringsAsFactors = F"
in your read.tablebed <- read.table("new.txt",sep = "\t" )
That it the only thing coming in my mind without seeing your data. Can you upload the MACS output?Doesn't work. But thanks anyways, I appreciate your help!
If chipAnnoPeak is not supported on bioconductor or biostars, I can better try another package. Any ideas? Maybe ChipSeeker?
I don't think this is a problem in the chipAnnoPeak package. The offending function seqlevelsStyle() comes from GenomeInfoDb pkg. Again, if you can post even a part of the data (which shows error), I can have a better look
Yeah, I think my bed file is not good. But don't know why though. I have solved the problem by using the summit bed file from MACS2.
Thanks for your input!
With the summits it works (with warnings but no errors).
Happy that it worked finally :)