Question

How to annotate ChIP peaks base on NCBI sequence name?

0

Entering edit mode

3.4 years ago

iridha • 0

I have chip-seq peaks based on NCBI genome which looks like the following: seqnames ranges strand | Conc <Rle> <IRanges> <Rle> | <numeric> X0001.0806584 NC_000003.12 75668920-75669635 * | 11.10671 X0002.1092998 NC_000005.10 34190588-34192289 * | 8.45169 X0002.1092999 NC_000005.10 34190588-34192289 * | 8.45169 X0003.1726283 NC_000009.12 137101991-137103797 * | 8.30861 Although I used human_NCBI_GRCh38p12 for alignment when I get its annotation file in bioconductor the sequence names are based on chromosome name like the following:

TxDb.Hsapiens.UCSC.hg38.knownGene
ucsc.hg38.knownGene <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene)

seqnames              ranges strand |     gene_id
               <Rle>           <IRanges>  <Rle> | <character>
          1    chr19   58345178-58362751      - |           1
         10     chr8   18391282-18401218      + |          10
        100    chr20   44619522-44652233      - |         100
       1000    chr18   27950966-28177130      - |        1000
  100009613    chr11   70072434-70075433      - |   100009613

the annotation is not working because of the difference in seqnames.

peaks_annotated<- annotatePeakInBatch(Peaks, AnnotationData=ucsc.hg38.knownGene)

using GCF_000001405.39_GRCh38.p13_genomic.gtf directly result in about 1 million gene for only 2000 peaks.

Any help that I can get this problem solved is highly and deeply appreciated.

ChIP-Seq alignment annotation sequence • 425 views

ADD COMMENT • link 3.4 years ago by iridha • 0