Question: Error with ChIPpeakAnno Ensembl annotation
0
gravatar for Benn
2.5 years ago by
Benn6.8k
Netherlands
Benn6.8k wrote:

Hello,

I have asked this question on bioconductor, but I didn't get any response there. I hope I can get some help here.

I am trying to annotate peaks derived with MACS2 and run into some errors. I hope someone can help me to get rid of these errors. I use ChIPpeakAnno and have ensembl data, but I get errors about the style. Here my R code:

> library(ChIPpeakAnno)
> library(GenomicFeatures)
>
> bed <- read.table("new.txt",sep = "\t" )
> colnames(bed)<-c("chr", "start", "end", "x")
> gr1 <- toGRanges(bed, format="BED", header=T)
duplicated or NA names found. Rename all the names by numbers.
> gr1
GRanges object with 69388 ranges and 1 metadata column:
         seqnames                 ranges strand |               x
            <Rle>              <IRanges>  <Rle> |        <factor>
  X00001       1  [ 39845630,  39845896]      * |   All_peak_1450
  X00002       1  [191889009, 191889559]      * |   All_peak_5137
  X00003       1  [212860802, 212861026]      * |   All_peak_5820
  X00004       1  [ 36306817,  36307312]      * |   All_peak_1337
  X00005       1  [ 44433306,  44433488]      * |   All_peak_1630
     ...      ...                    ...    ... .             ...
  X69384       Y    [56847145, 56847269]      * |  All_peak_69384
  X69385       Y    [56850752, 56850897]      * |  All_peak_69385
  X69386       Y    [56858855, 56858947]      * |  All_peak_69386
  X69387       Y    [56859031, 56859116]      * |  All_peak_69387
  X69388       Y    [56861359, 56861446]      * |  All_peak_69388
  -------
  seqinfo: 127 sequences from an unspecified genome; no seqlengths

> library(EnsDb.Hsapiens.v79)
> annoData <- toGRanges(EnsDb.Hsapiens.v79)
> seqlevelsStyle(annoData) <- "Ensembl"
> seqlevelsStyle(gr1) <- seqlevelsStyle(annoData)
Error in .replace_seqlevels_style(x_seqlevels, value) :
  found no sequence renaming map compatible with seqname style "NCBI" for this object
In addition: Warning message:
In `seqlevelsStyle<-`(`*tmp*`, value = c("NCBI", "Ensembl")) :
  more than one seqlevels style supplied, using the 1st one only
> annoPeaks(gr1, annoData)
Error in seqlevelsStyle(seqlevels(x)) :
  The style does not have a compatible entry for the species supported by
  Seqname. Please see genomeStyles() for supported species/style

I hope that someone knows what's going on, I think that the style is not correct defined, but don't know how to correct this.

Thanks in advance.

Best, Ben

genomicfeatures chippeakanno • 1.4k views
ADD COMMENTlink modified 2.5 years ago by Santosh Anand4.7k • written 2.5 years ago by Benn6.8k
0
gravatar for Santosh Anand
2.5 years ago by
Santosh Anand4.7k
Santosh Anand4.7k wrote:

There are two problems I am seeing here:

  • "found no sequence renaming map compatible with seqname style "NCBI" for this object In addition:" -> check if the seqnames (ie the chromosome names in the original MACS2 file) are properly formatted. What was the original assembly against which you ran MACS?. You can find all the mappings seqlevelsStyle is converting using the command genomeStyles()$Homo_sapiens in R.

  • When you set seqlevelsStyle(annoData) <- "Ensembl", it will set seqlevelsStyle = c("NCBI", "Ensembl"), as both of them have same style. That's why the warning "more than one seqlevels style supplied, using the 1st one only"

ADD COMMENTlink written 2.5 years ago by Santosh Anand4.7k

Thanks for your comment.

Yes the MACS2 file output for chromosome is in Ensembl format (which is the same as NCBI also), you can see that when I types in gr1.

For the second point, I did understand that when I say that the style is "Ensembl" it also matches "NCBI". But how do I do this correctly?

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Benn6.8k

Please check all of them. You have 127 sequences in MACS output seqinfo: 127 sequences from an unspecified genome; no seqlengths

For the second point, just use any one of them, instead of forcing seqlevelsStyle(gr1) <- seqlevelsStyle(annoData), you can try seqlevelsStyle(gr1) = "Ensembl" explicitly.

ADD REPLYlink written 2.5 years ago by Santosh Anand4.7k

Thanks again.

The 127 sequences are the extra chromosomes in the genome (hg38). In the annoData object I created there are 319 seqnames. I don't think is this the problem.

> gr1
GRanges object with 69388 ranges and 1 metadata column:
         seqnames                 ranges strand |               x
            <Rle>              <IRanges>  <Rle> |        <factor>
  X00001       1  [ 39845630,  39845896]      * |   All_peak_1450
  X00002       1  [191889009, 191889559]      * |   All_peak_5137
  X00003       1  [212860802, 212861026]      * |   All_peak_5820
  X00004       1  [ 36306817,  36307312]      * |   All_peak_1337
  X00005       1  [ 44433306,  44433488]      * |   All_peak_1630
     ...      ...                    ...    ... .             ...
  X69384       Y    [56847145, 56847269]      * |  All_peak_69384
  X69385       Y    [56850752, 56850897]      * |  All_peak_69385
  X69386       Y    [56858855, 56858947]      * |  All_peak_69386
  X69387       Y    [56859031, 56859116]      * |  All_peak_69387
  X69388       Y    [56861359, 56861446]      * |  All_peak_69388
  -------
  seqinfo: 127 sequences from an unspecified genome; no seqlengths
> annoData
GRanges object with 65774 ranges and 1 metadata column:
                    seqnames                 ranges strand |   gene_name
                       <Rle>              <IRanges>  <Rle> | <character>
  ENSG00000278806 KI270752.1           [ 144,  268]      + |  AF065393.4
  ENSG00000210049         MT           [ 577,  647]      + |       MT-TF
  ENSG00000211459         MT           [ 648, 1601]      + |     MT-RNR1
  ENSG00000277107 KI270724.1           [1529, 1607]      - |  AC242209.2
  ENSG00000210077         MT           [1602, 1670]      + |       MT-TV
              ...        ...                    ...    ... .         ...
  ENSG00000171163          1 [248850006, 248859144]      - |      ZNF692
  ENSG00000227237          1 [248859164, 248864796]      + |  AL672294.1
  ENSG00000185220          1 [248906196, 248919946]      + |       PGBD2
  ENSG00000200495          1 [248912690, 248912795]      - |  RNU6-1205P
  ENSG00000233084          1 [248936581, 248937043]      + |  AL672183.2
  -------
  seqinfo: 319 sequences from GRCh38 genome

You can see that annoData contains these other chromosomes as well (such as KI270752.1, etc.).

Besides, when I only focus on the normal chromosomes, I still get the same error:

gr2 <- toGRanges(bed[c(1:66047,66903:69388),])
> table(gr2@seqnames)

  1   10   11   12   13   14   15   16   17   18   19    2   20   21   22    3  
6781 3107 3457 3177 1678 2019 2108 2405 2725 1308 2542 5393 1580  888 1322 4419 
  4    5    6    7    8    9    X    Y  
3394 3847 4214 3679 2964 3040 2363  123 

> seqlevelsStyle(gr2) = "Ensembl"
Error in .replace_seqlevels_style(x_seqlevels, value) : 
  found no sequence renaming map compatible with seqname style "Ensembl" for this object

I think something goes wrong with loading the data? But what and how do I do this correct?

ADD REPLYlink written 2.5 years ago by Benn6.8k

Try argument "stringsAsFactors = F" in your read.table bed <- read.table("new.txt",sep = "\t" ) That it the only thing coming in my mind without seeing your data. Can you upload the MACS output?

ADD REPLYlink written 2.5 years ago by Santosh Anand4.7k

Doesn't work. But thanks anyways, I appreciate your help!

If chipAnnoPeak is not supported on bioconductor or biostars, I can better try another package. Any ideas? Maybe ChipSeeker?

ADD REPLYlink written 2.5 years ago by Benn6.8k

I don't think this is a problem in the chipAnnoPeak package. The offending function seqlevelsStyle() comes from GenomeInfoDb pkg. Again, if you can post even a part of the data (which shows error), I can have a better look

ADD REPLYlink written 2.5 years ago by Santosh Anand4.7k

Yeah, I think my bed file is not good. But don't know why though. I have solved the problem by using the summit bed file from MACS2.

Thanks for your input!

> bed <- read.table("new.txt",sep = "\t")
> colnames(bed)<-c("chr", "start", "end", "x")
> head(bed)
  chr     start       end              x
1  1   39845629  39845896  All_peak_1450
2  1  191889008 191889559  All_peak_5137
3  1  212860801 212861026  All_peak_5820
4  1   36306816  36307312  All_peak_1337
5  1   44433305  44433488  All_peak_1630
6  1   54132564  54132932  All_peak_1940

With the summits it works (with warnings but no errors).

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Benn6.8k
1

Happy that it worked finally :)

ADD REPLYlink written 2.5 years ago by Santosh Anand4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 794 users visited in the last hour