Question: Error when counting reads in genes with summarizeOverlaps (Genomic-Alignments package)
1
gravatar for alejandro.colaneri
3.6 years ago by
United States
alejandro.colaneri10 wrote:

Hello,
I'm following the RNA-seq workflow for differential gene expression
white paper by Michael Love Simon Anders and Wolfgang Huber (http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=10&ved=0CFcQFjAJ&url=http%3A%2F%2Fwww.bioconductor.org%2Fhelp%2Fcourse-materials%2F2014%2FBioC2014%2FRNA-Seq-Analysis-Lab.pdf&ei=_ihmVK3nMoSrNsb9gsAH&usg=AFQjCNH5FkLy2MQwoJCUSMdfb3KrrP45Yw&sig2=iUkasIDbf7SyjBSOl0BKHQ&bvm=bv.79142246,d.eXY&cad=rja

 

This is what I'm doing:

## Use the function summarizeOverlaps to count reads in the gene
library("GenomicAlignments")
se <- summarizeOverlaps(exonsByGene, BamFileList(bamFiles), mode="Union", singleEnd=TRUE, ignore.strand=FALSE, fragments=FALSE);

 

however I got this error and I have not idea how to fix it:

Error in .summarizeOverlaps_BamFileList(features, reads, mode, ignore.strand = ignore.strand, :
duplicate 'names(reads)' not allowed


Can someone help please!!

 

Here are all the previous step before trying to create the objetc "se"

### read the table: sampleTable.csv

sampleTable <- read.csv("sampleTable.csv", header=TRUE);

### build the full path to the tophat produced bam files

bamFiles <- file.path(".", sampleTable$dirName, sampleTable$fileName);

### see the created vector with paths

bamFiles

##### Use the BamFile function from the RsamTools to se if these paths are functional

library ("Rsamtools");
seqinfo(BamFile(bamFiles[1]));

#Counting reads in genes

library("GenomicFeatures");

hse <-makeTranscriptDbFromGFF("/proj/seq/data/TAIR10_Ensembl/Annotation/Genes/genes.gtf", format="gtf")
exonsByGene <- exonsBy(hse, by="gene");

## Use the function summarizeOverlaps to count reads in the gene
library("GenomicAlignments")
se <- summarizeOverlaps(exonsByGene, BamFileList(bamFiles), mode="Union", singleEnd=TRUE, ignore.strand=FALSE, fragments=FALSE);
ADD COMMENTlink modified 2.9 years ago by SmallChess430 • written 3.6 years ago by alejandro.colaneri10

It would seem that your BAM files have overlaps in the reads that they contain.

Also, your link is broken - you might wanna update your question with the corrected link.

ADD REPLYlink written 3.6 years ago by Ram15k

it's discouraged to cross post on mulitple forum sites (Bioc support site, seqanswers, biostars), as it makes multiple people answer your question, and also makes it difficult to track down an answer which might have been given on another site. 

One of the GenomicAlignments maintainers answered here:

https://support.bioconductor.org/p/62966/

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Michael Love1.6k
0
gravatar for Chris S.
3.6 years ago by
Chris S.290
United States
Chris S.290 wrote:

Can you paste the output from

names(BamFileList(bamFiles))
[1] "SRR479052.bam" "SRR479053.bam" "SRR479054.bam"
any(duplicated( names(BamFileList(bamFiles)) )
FALSE

I think that warning should only be caused by duplicate BAM file names

 

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Chris S.290
0
gravatar for SmallChess
2.9 years ago by
SmallChess430
Australia
SmallChess430 wrote:

The error was caused by duplicate BAM file names. For example, in my script:


files <- c('/Volumes/SSHFS/Sources/QA/scripts/RNA_A1/aligned/accepted_hits.bam',
           '/Volumes/SSHFS/Sources/QA/scripts/RNA_A2/aligned/accepted_hits.bam',
           '/Volumes/SSHFS/Sources/QA/scripts/RNA_A3/aligned/accepted_hits.bam',
           '/Volumes/SSHFS/Sources/QA/scripts/RNA_B1/aligned/accepted_hits.bam',
           '/Volumes/SSHFS/Sources/QA/scripts/RNA_B2/aligned/accepted_hits.bam',
           '/Volumes/SSHFS/Sources/QA/scripts/RNA_B3/aligned/accepted_hits.bam')

.... load the files into BAM objects ....

> names(bams)
[1] "accepted_hits.bam" "accepted_hits.bam" "accepted_hits.bam" "accepted_hits.bam" "accepted_hits.bam" "accepted_hits.bam"


Even though the files are physically different, they all have the same key. We can rename those files or we can do:

names(bams) <- c("A1.bam", "A2.bam", "A3.bam", "B1.bam", "B2.bam", "B3.bam")

This will fix the problem.

 

 

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by SmallChess430
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1364 users visited in the last hour