Question: Errors When Using The Easyrnaseq Package From Bioconductor
2
gravatar for e.karasmani
7.1 years ago by
e.karasmani120
e.karasmani120 wrote:

Dear All,

I am using easyRNAseq to analyze some RNA seq data.....

and this is what i am doing.....

 library(easyRNASeq)
 library(BSgenome.Mmusculus.UCSC.mm9)
chr.sizes=seqlengths(Mmusculus)
bamfiles=dir(getwd(),pattern="*\\.bam$")

rnaSeq <- easyRNASeq(filesDirectory=getwd(),
,organism="Mmusculus"
,chr.sizes="auto"
,readLength=36L
,annotationMethod="biomaRt"
,count="exons"
,filenames=bamfiles[1]
,outputFormat="RNAseq"
)

and that is what i get.....

Checking arguments...
Fetching annotations...
Summarizing counts...
Processing s_7_1_I12-1147-01.rnaseq.mm9.srt.bam
 Warning messages:
1: In easyRNASeq(filesDirectory = getwd(), , organism = "Mmusculus",  :
  There are 366088 features/exons defined in your annotation that overlap! This implies that some reads will be counted more     than once! Is that really what you want?
2: In fetchCoverage(rnaSeq, format = format, filename = filename, filter = filter,  :
  You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it.
3: In fetchCoverage(rnaSeq, format = format, filename = filename, filter = filter,  :
  Not all the chromosome names in your chromosome size list 'chr.sizes' are present in your read file(s) (aln or bam).
4: In fetchCoverage(rnaSeq, format = format, filename = filename, filter = filter,  :
 The available chromosomes in both your read file(s) (aln or bam) and 'chr.sizes' list were restricted to their common term.
These are: chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr3, chr4, chr5, chr6, chr7, chr8,     chr9, chrM, chrX, chrY.
Error in unlist(aggregate(readCoverage(obj)[match(names(genomicAnnotation(obj)),  :
  error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in .Call2("Rle_getStartEndRunAndOffset", x,      start, end, PACKAGE = "IRanges") :
    ' x' values larger than vector length 'sum(width)'

What I am doing wrong????

How can I use the program to analyze my RNA seq data

Could you please help me?

Thank you in advance

Best regards Lena

bioconductor rna • 2.4k views
ADD COMMENTlink modified 7.1 years ago by Sukhdeep Singh9.9k • written 7.1 years ago by e.karasmani120

Please update your question with the output of sessionInfo().

Also, you may have better luck sending this question to the bioconductor mailing list, since I know the author of this package watches that list closely, and I'm not sure if he's on biostars.

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by Steve Lianoglou5.0k
0
gravatar for Sukhdeep Singh
7.1 years ago by
Sukhdeep Singh9.9k
Netherlands
Sukhdeep Singh9.9k wrote:

This post should answer it easyRNASeq package: error when summarizing per gene. According to the developer (I am copying the answer for future reference for someone)

The error is probably indeed due to your annotation file in combination with you chromosome size list. Every gene loci should be fully contained in the chromosomes, i.e. their coordinates should not be larger than the chromosome size you provide for the chromosome they're located on. I believe I'm checking this in the code, but you might have just found a configuration of providing the annotation and chromosome sizes arguments that I overlooked. The first use case in the "Use Cases" section of the vignette explains why the chromosome sizes are so important to the process. The development version of easyRNASeq (available around October 2nd (PDT time)), makes this a lot easier.

You can easily check if this is the case by running the following command:

lapply(names(seqlengths(Mmusculus)),function(chr,rngs,sizes){
    cat("processing chromosome",chr,"\n")
sel <- space(exon_range) == chr
stopifnot(all(start(exon_range[sel,]) <= sizes[chr]))
stopifnot(all(end(exon_range[sel,]) <= sizes[chr]))
},exon_range,seqlengths(Mmusculus))

It should stop if any of your gene coordinate is larger that the chromosome size it's on. Note that I've just written this bit of code up in this email, so there might be typos. If there are and you can't figure them out, just let me know. If there are no typos and it stops, then you would have to modify either your exon_range object or more easily your chromosome sizes one. For this you could simply use the scanBamHeader (Rsamtools package) function to extract the chromosome sizes from your bam file as follow (change the first list to match your settings):

filesList <- dir("[your bam directory]",pattern="*\\.bam$",full.name=TRUE)

## read the headers
            headers <- scanBamHeader(filesList)

            ## Two sanity checks
            if(!all(sapply(headers,
                           function(header,expected){
                             identical(sort(names(header$targets)),expected)
                           },sort(names(headers[[1]]$targets))))){
              stop("Not all BAM files use the same chromosome names.")
            }

            chr.sizes <- headers[[1]]$targets
            if(!all(sapply(headers, function(header,chr.sizes){
              identical(header$targets[match(names(chr.sizes),
                                             names(header$targets))],chr.sizes)
            },chr.sizes))){
              stop("The chromosome lengths differ between BAM files.")
            }

            ## check if we got some chr sizes at all
            if(length(chr.sizes)==0){
              stop("No chromosome sizes could be determined from your BAM file(s).Is the BAM header present?\nIf not, you can use the 'chr.sizes' argument to provided the chromosome sizes information.")
            }

That's actually an excerpt of the easyRNASeq development version code to extract the chromosome size from the BAM files. I think that this is the reason for the error. If not, we can take this discussion off the list to figure it out, provided you can give me access to your annotation object and to an excerpt of your data (that you could upload to my dropbox). This error should actually not occur in the development version anymore, but it would be great to give it a try after October 3rd again. Let me know how it goes, Cheers


Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310

Email: nicolas.delhomme at embl.de

So, either try

1) using the development version

or 2) contact him

ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Sukhdeep Singh9.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1908 users visited in the last hour