Error with ChIPpeakAnno Ensembl annotation
1
0
Entering edit mode
5.9 years ago
Benn 8.3k

Hello,

I have asked this question on bioconductor, but I didn't get any response there. I hope I can get some help here.

I am trying to annotate peaks derived with MACS2 and run into some errors. I hope someone can help me to get rid of these errors. I use ChIPpeakAnno and have ensembl data, but I get errors about the style. Here my R code:

> library(ChIPpeakAnno)
> library(GenomicFeatures)
>
> bed <- read.table("new.txt",sep = "\t" )
> colnames(bed)<-c("chr", "start", "end", "x")
> gr1 <- toGRanges(bed, format="BED", header=T)
duplicated or NA names found. Rename all the names by numbers.
> gr1
GRanges object with 69388 ranges and 1 metadata column:
seqnames                 ranges strand |               x
<Rle>              <IRanges>  <Rle> |        <factor>
X00001       1  [ 39845630,  39845896]      * |   All_peak_1450
X00002       1  [191889009, 191889559]      * |   All_peak_5137
X00003       1  [212860802, 212861026]      * |   All_peak_5820
X00004       1  [ 36306817,  36307312]      * |   All_peak_1337
X00005       1  [ 44433306,  44433488]      * |   All_peak_1630
...      ...                    ...    ... .             ...
X69384       Y    [56847145, 56847269]      * |  All_peak_69384
X69385       Y    [56850752, 56850897]      * |  All_peak_69385
X69386       Y    [56858855, 56858947]      * |  All_peak_69386
X69387       Y    [56859031, 56859116]      * |  All_peak_69387
X69388       Y    [56861359, 56861446]      * |  All_peak_69388
-------
seqinfo: 127 sequences from an unspecified genome; no seqlengths

> library(EnsDb.Hsapiens.v79)
> annoData <- toGRanges(EnsDb.Hsapiens.v79)
> seqlevelsStyle(annoData) <- "Ensembl"
> seqlevelsStyle(gr1) <- seqlevelsStyle(annoData)
Error in .replace_seqlevels_style(x_seqlevels, value) :
found no sequence renaming map compatible with seqname style "NCBI" for this object
In seqlevelsStyle<-(*tmp*, value = c("NCBI", "Ensembl")) :
more than one seqlevels style supplied, using the 1st one only
> annoPeaks(gr1, annoData)
Error in seqlevelsStyle(seqlevels(x)) :
The style does not have a compatible entry for the species supported by
Seqname. Please see genomeStyles() for supported species/style


I hope that someone knows what's going on, I think that the style is not correct defined, but don't know how to correct this.

Best, Ben

ChipPeakAnno genomicfeatures • 3.2k views
0
Entering edit mode
5.9 years ago

There are two problems I am seeing here:

• "found no sequence renaming map compatible with seqname style "NCBI" for this object In addition:" -> check if the seqnames (ie the chromosome names in the original MACS2 file) are properly formatted. What was the original assembly against which you ran MACS?. You can find all the mappings seqlevelsStyle is converting using the command genomeStyles()\$Homo_sapiens in R.

• When you set seqlevelsStyle(annoData) <- "Ensembl", it will set seqlevelsStyle = c("NCBI", "Ensembl"), as both of them have same style. That's why the warning "more than one seqlevels style supplied, using the 1st one only"

0
Entering edit mode

Yes the MACS2 file output for chromosome is in Ensembl format (which is the same as NCBI also), you can see that when I types in gr1.

For the second point, I did understand that when I say that the style is "Ensembl" it also matches "NCBI". But how do I do this correctly?

0
Entering edit mode

Please check all of them. You have 127 sequences in MACS output seqinfo: 127 sequences from an unspecified genome; no seqlengths

For the second point, just use any one of them, instead of forcing seqlevelsStyle(gr1) <- seqlevelsStyle(annoData), you can try seqlevelsStyle(gr1) = "Ensembl" explicitly.

0
Entering edit mode

Thanks again.

The 127 sequences are the extra chromosomes in the genome (hg38). In the annoData object I created there are 319 seqnames. I don't think is this the problem.

> gr1
GRanges object with 69388 ranges and 1 metadata column:
seqnames                 ranges strand |               x
<Rle>              <IRanges>  <Rle> |        <factor>
X00001       1  [ 39845630,  39845896]      * |   All_peak_1450
X00002       1  [191889009, 191889559]      * |   All_peak_5137
X00003       1  [212860802, 212861026]      * |   All_peak_5820
X00004       1  [ 36306817,  36307312]      * |   All_peak_1337
X00005       1  [ 44433306,  44433488]      * |   All_peak_1630
...      ...                    ...    ... .             ...
X69384       Y    [56847145, 56847269]      * |  All_peak_69384
X69385       Y    [56850752, 56850897]      * |  All_peak_69385
X69386       Y    [56858855, 56858947]      * |  All_peak_69386
X69387       Y    [56859031, 56859116]      * |  All_peak_69387
X69388       Y    [56861359, 56861446]      * |  All_peak_69388
-------
seqinfo: 127 sequences from an unspecified genome; no seqlengths
> annoData
GRanges object with 65774 ranges and 1 metadata column:
seqnames                 ranges strand |   gene_name
<Rle>              <IRanges>  <Rle> | <character>
ENSG00000278806 KI270752.1           [ 144,  268]      + |  AF065393.4
ENSG00000210049         MT           [ 577,  647]      + |       MT-TF
ENSG00000211459         MT           [ 648, 1601]      + |     MT-RNR1
ENSG00000277107 KI270724.1           [1529, 1607]      - |  AC242209.2
ENSG00000210077         MT           [1602, 1670]      + |       MT-TV
...        ...                    ...    ... .         ...
ENSG00000171163          1 [248850006, 248859144]      - |      ZNF692
ENSG00000227237          1 [248859164, 248864796]      + |  AL672294.1
ENSG00000185220          1 [248906196, 248919946]      + |       PGBD2
ENSG00000200495          1 [248912690, 248912795]      - |  RNU6-1205P
ENSG00000233084          1 [248936581, 248937043]      + |  AL672183.2
-------
seqinfo: 319 sequences from GRCh38 genome


You can see that annoData contains these other chromosomes as well (such as KI270752.1, etc.).

Besides, when I only focus on the normal chromosomes, I still get the same error:

gr2 <- toGRanges(bed[c(1:66047,66903:69388),])
> table(gr2@seqnames)

1   10   11   12   13   14   15   16   17   18   19    2   20   21   22    3
6781 3107 3457 3177 1678 2019 2108 2405 2725 1308 2542 5393 1580  888 1322 4419
4    5    6    7    8    9    X    Y
3394 3847 4214 3679 2964 3040 2363  123

> seqlevelsStyle(gr2) = "Ensembl"
Error in .replace_seqlevels_style(x_seqlevels, value) :
found no sequence renaming map compatible with seqname style "Ensembl" for this object


I think something goes wrong with loading the data? But what and how do I do this correct?

0
Entering edit mode

Try argument "stringsAsFactors = F" in your read.table bed <- read.table("new.txt",sep = "\t" ) That it the only thing coming in my mind without seeing your data. Can you upload the MACS output?

0
Entering edit mode

Doesn't work. But thanks anyways, I appreciate your help!

If chipAnnoPeak is not supported on bioconductor or biostars, I can better try another package. Any ideas? Maybe ChipSeeker?

0
Entering edit mode

I don't think this is a problem in the chipAnnoPeak package. The offending function seqlevelsStyle() comes from GenomeInfoDb pkg. Again, if you can post even a part of the data (which shows error), I can have a better look

0
Entering edit mode

Yeah, I think my bed file is not good. But don't know why though. I have solved the problem by using the summit bed file from MACS2.

> bed <- read.table("new.txt",sep = "\t")
> colnames(bed)<-c("chr", "start", "end", "x")
chr     start       end              x
1  1   39845629  39845896  All_peak_1450
2  1  191889008 191889559  All_peak_5137
3  1  212860801 212861026  All_peak_5820
4  1   36306816  36307312  All_peak_1337
5  1   44433305  44433488  All_peak_1630
6  1   54132564  54132932  All_peak_1940


With the summits it works (with warnings but no errors).

1
Entering edit mode

Happy that it worked finally :)