Question: Whole exomes: Rsamtools, RAM issue
0
gravatar for bioinfo8
2.1 years ago by
bioinfo8120
bioinfo8120 wrote:

Hi,

I have many whole exome BAM files (aligned to reference). As an R admirer, I used Rstudio to do initial analysis on one of them (~2.7 GB) using Rsamtools.

1) Whether single or paired-end:

> testPairedEndBam("1.bam")   
[1] TRUE 
> quickBamFlagSummary("1.bam")           # Got detailed information

2) Read bam file

> bam <- BamFile("1.bam", asMates = TRUE)
> bam
    class: BamFile 
    path: 1.bam
    index:1.bam.bai
    isOpen: FALSE 
    yieldSize: NA 
    obeyQname: FALSE 
    asMates: TRUE                                       
    qnamePrefixEnd: NA 
    qnameSuffixStart: NA

3) Some high­ level information

> seqinfo(bam)

Seqinfo object with 1133 sequences from an unspecified genome:               # and other information

4) Read all the reads in the file using scanBam()

details <- scanBam(bam)

But at this step, it goes on running and running and I am stuck at this step. Any thoughts please? How much RAM is required to process a 3GB BAM file in Rstudio? I have windows 8.1, 64-bit computer with 16 GB RAM.

Thanks!

> sessionInfo()

R version 3.3.2 (2016-10-31)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows >= 8 x64 (build 9200)

locale:

[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:

[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):

[1] zlibbioc_1.16.0 IRanges_2.4.8 XVector_0.10.0 futile.logger_1.4.3 parallel_3.3.2
[6] tools_3.3.2 GenomicRanges_1.22.4 lambda.r_1.1.9 futile.options_1.0.0 Biostrings_2.38.4
[11] S4Vectors_0.8.11 BiocGenerics_0.16.1 BiocParallel_1.4.3 Rsamtools_1.26.1 GenomeInfoDb_1.6.3
[16] stats4_3.3.2 bitops_1.0-6

whole exome bam rsamtools ram R • 860 views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by bioinfo8120

Check in R how much memory you can use:

memory.limit()
ADD REPLYlink written 2.1 years ago by Benn6.6k
> memory.limit()
    [1] 16365

But if I wanted to check for a specific file, then?

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by bioinfo8120

I think memory is not the problem then. Do you get an error? How long do you let it run?

Maybe traceback can give you any clues?

traceback()
ADD REPLYlink written 2.1 years ago by Benn6.6k

It was running for 31 minutes and showed following error:

> details <- scanBam(bam) 
Error in value[[3L]](cond) : cannot allocate vector of size 1024.0 Mb
  file: 1.bam
  index: 1.bam.bai

> traceback()
8: stop(conditionMessage(err), "\n  file: ", path(file), "\n  index: ", 
       index(file))
7: value[[3L]](cond)
6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
5: tryCatchList(expr, classes, parentenv, handlers)
4: tryCatch({
       .Call(func, .extptr(file), space, flag, simpleCigar, tagFilter, 
           mapqFilter, ...)
   }, error = function(err) {
       stop(conditionMessage(err), "\n  file: ", path(file), "\n  index: ", 
           index(file))
   })
3: .io_bam(.scan_bamfile, file, reverseComplement, yieldSize(file), 
       tmpl, obeyQname(file), asMates(file), qnamePrefix, qnameSuffix, 
       param = param)
2: scanBam(bam)
1: scanBam(bam)
ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by bioinfo8120

Since you are using windows, maybe this answer on stackoverflow can help:

http://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb/24754706#24754706

ADD REPLYlink written 2.1 years ago by Benn6.6k

Thanks! but did not find it useful.

ADD REPLYlink written 2.1 years ago by bioinfo8120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1127 users visited in the last hour