Question: out of memory in R after run ChIPSeeker
0
gravatar for Lila M
3.5 years ago by
Lila M 820
UK
Lila M 820 wrote:

Hi everybody, I've tried to make some TSS profiles as follow:

Normalized Bam files using bamCoverage (output NGS974Norm bedgraph)

running ChiPseeker in bioconductor:

reads <- import.bed(con="NGS974Norm") print(reads) cov <- coverage(reads) peakAnno <- annotatePeak(reads, tssRegion=c(-500, 2000), TxDb=TxDb.Hsapiens.UCSC.hg19.knownGene, annoDb="org.Hs.eg.db")

But when it finish reports the following error: Error in rsqlite_send_query(conn@ptr, statement) : out of memory

any ideas?

chipseeker chip-seq R • 1.7k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Lila M 820
1

How big is your dataset ? How much RAM do you have on your computer ? Have you tried running annotatePeak on only a few genes and not the full annotation (TxDb.Hsapiens.UCSC.hg19.knownGene) ?

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Carlo Yague5.0k

I think this is not the problem because I have Memory 31.1GiB Processor Cor i7-6700 CPU @ 3.40GHz × 8

I need to run it woth all the TSS of all ensembl protein coding transcripts. any Idea?

ADD REPLYlink written 3.5 years ago by Lila M 820

When exactly do you get the error ?

After importing the bed file or when running annotatePeak ? If it is the later, at which step does it crash ? Loading peak file ? calculating distances ?

ADD REPLYlink written 3.5 years ago by Carlo Yague5.0k

Is just after "adding gene annotation... " could it be the file?

ADD REPLYlink written 3.5 years ago by Lila M 820

That's weird...You can always try to test the code on smaller datasets (only 1 chrom of the bedgraph file or only 1 gene of the annotation for instance), but honestly, I don't know what is going on.

ADD REPLYlink written 3.5 years ago by Carlo Yague5.0k
1

I've solved the problem, ChIPseeker need a file with 6 columns, not with 4 as bedgraph files report.

ADD REPLYlink written 3.5 years ago by Lila M 820

can you send me a sample file (e.g. first 20 rows of your bedgraph file) for testing?

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Guangchuang Yu2.3k

Hi, yes it this:

"chr1" 0 10000 0 1 "." "chr1" 10000 10050 55 2 "." "chr1" 10050 10100 91 3 "." "chr1" 10100 10150 70 4 "." "chr1" 10150 10200 71 5 "." "chr1" 10200 10250 68 6 "." "chr1" 10250 10300 51 7 "."" "chr1" 10450 10500 14 11 "."

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Lila M 820

It would be better to send an attached file instead of pasting file content.

I don't think it's the number of column issue. I tested and it actually works fine with only 4 columns.

> f = getSampleFiles()[[4]]
> p = readPeakFile(f)
> p$V4 = NULL
> p$V5 = NULL
> p
GRanges object with 1331 ranges and 0 metadata columns:
         seqnames                 ranges strand
            <Rle>              <IRanges>  <Rle>
     [1]     chr1     [ 815093,  817883]      *
     [2]     chr1     [1243288, 1244338]      *
     [3]     chr1     [2979977, 2981228]      *
     [4]     chr1     [3566182, 3567876]      *
     [5]     chr1     [3816546, 3818111]      *
     ...      ...                    ...    ...
  [1327]     chrX [135244783, 135245821]      *
  [1328]     chrX [139171964, 139173506]      *
  [1329]     chrX [139583954, 139586126]      *
  [1330]     chrX [139592002, 139593238]      *
  [1331]     chrY [ 13845134,  13845777]      *
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

> a = annotatePeak(p, annoDb='org.Hs.eg.db')
>> preparing features information...         2016-12-21 19:12:27
>> identifying nearest features...       2016-12-21 19:12:27
>> calculating distance from peak to TSS...  2016-12-21 19:12:27
>> assigning genomic annotation...       2016-12-21 19:12:27
>> adding gene annotation...             2016-12-21 19:12:28
'select()' returned many:many mapping between keys and columns
>> assigning chromosome lengths          2016-12-21 19:12:28
>> done...                   2016-12-21 19:12:28
Warning message:
In loadTxDb(TxDb) :
  >> TxDb is not specified, use 'TxDb.Hsapiens.UCSC.hg19.knownGene' by default...
> a
Annotated peaks generated by ChIPseeker
1331/1331  peaks were annotated
Genomic Annotation Summary:
              Feature  Frequency
9    Promoter (<=1kb) 48.1592787
10   Promoter (1-2kb)  6.0856499
11   Promoter (2-3kb)  4.5078888
4              5' UTR  0.3005259
3              3' UTR  2.1036814
1            1st Exon  0.6010518
7          Other Exon  2.4042074
2          1st Intron  2.1036814
8        Other Intron  5.8602554
6  Downstream (<=3kb)  0.9767092
5   Distal Intergenic 26.8970699

ADD REPLYlink written 3.5 years ago by Guangchuang Yu2.3k

A thought that comes to my mind: Just because your computer has 32 GB RAM does not mean that 32GB Ram are also available to R. So it might be useful to just go back to the original error msg and try to investigate. Also the error msg talk about rsqllite, so maybe this module ran out of memory? As mentioned before, I'd try it with less data see if that helps, also monitor the memory usage when you start the process. Sorry can not be more specific than that. .... but 'all protein coding transcripts in ensembl' ... that sounds like something that could push any machine to its limits.

ADD REPLYlink written 3.5 years ago by LLTommy1.2k

(posted twice sorry)

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by LLTommy1.2k
0
gravatar for Guangchuang Yu
3.5 years ago by
Guangchuang Yu2.3k
China/Guangzhou/Southern Medical University
Guangchuang Yu2.3k wrote:

@Lila M had sent me the original file which is about 1.2Gb. It's indeed memory intensive in ID converting step. I had optimized this step and it runs more faster and required less memory.

The optimized version is available on github: https://github.com/GuangchuangYu/ChIPseeker.

It only took 19 second to annotate the peaks (31027455 records, 1.2Gb):

> p = readPeakFile("~/Downloads/peak_NGS974")
> a=annotatePeak(p, annoDb='org.Hs.eg.db')
>> preparing features information...         2016-12-21 20:53:31
>> identifying nearest features...         2016-12-21 20:53:31
>> calculating distance from peak to TSS...     2016-12-21 21:08:42
>> assigning genomic annotation...         2016-12-21 21:08:42
>> adding gene annotation...             2016-12-21 21:11:52
>> assigning chromosome lengths             2016-12-21 21:12:35
>> done...                     2016-12-21 21:12:35

Warning message:
In loadTxDb(TxDb) :
  >> TxDb is not specified, use 'TxDb.Hsapiens.UCSC.hg19.knownGene' by default...
>
> a
Annotated peaks generated by ChIPseeker
31027137/31027455  peaks were annotated
Genomic Annotation Summary:
              Feature  Frequency
9    Promoter (<=1kb)  2.7942862
10   Promoter (1-2kb)  2.3390009
11   Promoter (2-3kb)  2.1797564
4              5' UTR  0.1244330
3              3' UTR  1.2549530
1            1st Exon  0.1156955
7          Other Exon  1.4102784
2          1st Intron  8.1670636
8        Other Intron 33.7840291
6  Downstream (<=3kb)  0.8216872
5   Distal Intergenic 47.0088168
ADD COMMENTlink written 3.5 years ago by Guangchuang Yu2.3k

I had optimized this step and it runs more faster and required less memory.

Nice, that was what I thought. If we talk about something like 'all protein coding transcripts in ensembl' it is important that your code is clean. If not, I am not surprised that you run into problems.

ADD REPLYlink written 3.5 years ago by LLTommy1.2k

Thank you very much! Now the program run super faster!

ADD REPLYlink written 3.5 years ago by Lila M 820
0
gravatar for Lila M
3.5 years ago by
Lila M 820
UK
Lila M 820 wrote:

wrong post !!!

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Lila M 820
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1580 users visited in the last hour