chromosome name in summarizeOverlaps for RNASeq data
1
0
Entering edit mode
8.3 years ago
BioProg • 0

For a while I used quantile normalization followed by t-test or cuffdiff for analysis of RNASeq. I would like to try DESeq2 given its better normalization method based on the readings. I am using your R package summarizeOverlaps to create count tables as I found it to be the easiest. However, I am encountering couple of issues. First,the ensemble-based gtf file that I used to align all my mouse RNAseq data has the chromosomes named as "1, 2 , 3"... Rather than "chr1 , chr2 ..". My Testing R code for summarizeOverlaps is as follows:

library(TxDb.Mmusculus.UCSC.mm10.ensGene)

TxDb_mm10=TxDb.Mmusculus.UCSC.mm10.ensGene
saveDb(TxDb_mm10, file="mm10.sqlite")
mm10<-loadDb("mm10.sqlite")
exbyGene<-exonsBy(mm10,by="gene") #I assume this is assigning things by GENES ids (ENSG..)
exbyTx<-exonsBy(mm10,by="tx") #I assume this is assigning things by Transcript IDS (ENST…)
fls <- list.files("Path/To/BamFiles", pattern=".accepted_hits.bam", full= TRUE)
grp1<- fls[1]
grp2 <- fls[c(2,3)]
bamLst_grp1<-BamFileList(grp1,yieldSize=100000)
bamLst_grp2<-BamFileList(grp2,yieldSize=100000)
head(seqlevels(TxDb_mm10)) # this shows that the mm10 sqlite I am using has chr1, chr2 .. Format
seqinfo(bamLst_grp1) #this shows that the chromosome names for my testing bam files are 1, 2, 3...
se_grp1<-summarizeOverlaps(exbyGene,bamLst_grp1,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE)
se_grp2<-summarizeOverlaps(exbyGene,bamLst_grp2,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE)
##Saving environment
saveRDS(se_grp1, file="se_grp1.Rdata")
saveRDS(se_grp2, file="se_grp2.Rdata")

While the script runs nicely (and please, advice me if I should optimize anything as I have basic understanding of each of the commands), when I tried to load the environment I am getting the following error:

Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file 'se_grp2.Rdata' has magic number 'X'
  Use of save versions prior to 2 is deprecated

Also, I am not sure if the error is because the chromosome names are incompatible between the reference and bam file. If that's the case, how can I correct for this?

RNA-Seq summarizeOverlaps annotation DESeq2 • 2.2k views
ADD COMMENT
1
Entering edit mode
8.3 years ago

The error is unrelated to any chromosome names and has to do with R not being able to load the file it saved. Presumably there was a (hopefully transient) storage problem.

ADD COMMENT
0
Entering edit mode

You are right I should have used readRDS as i saved file as RDS.

Thank you

ADD REPLY

Login before adding your answer.

Traffic: 2579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6