Question: geneCounts from sorted bam all 0?
1
gravatar for kathryn.eckartt
19 months ago by
kathryn.eckartt30 wrote:

I appreciate any advice you could give! I'm using R version 3.4.4, platform 64-bit Linux-gnu.

I've aligned RNAseq reads to a reference genome using QuasR and created a gene model from a GFF file:

qAlign("sampleTable.txt","refGenome.fa,splicedAlignment=TRUE)

library(GenomicFeatures)

geneModel<-("path/geneModel.gff3.gz","gff3")
genes<-genes(geneModel)

I've confirmed my sorted bam file has alignments, but when I try to obtain gene counts using GenomicAlignments summarizeOverlaps() all counts are listed as 0 in my final matrix.

library(GenomicAlignments)

sortBam("alignment.bam","alignment.bam.bai",destination="Sorted") 
indexBam("Sorted.bam")
myBam<-BamFile("Sorted.bam",yieldSize=10000)
geneCounts<-summarizeOverlaps(genes,myBam,singleEnd=FALSE)

myGeneGR<-rowRanges(geneCounts)
myGeneCounts<-genecounts[genes$gene_id==myGeneGR$gene_id,]
library(SummarizedExperiments)
myGeneCounts_Matrix<-assay(myGeneCounts)
ADD COMMENTlink modified 19 months ago by zx87549.7k • written 19 months ago by kathryn.eckartt30
1
gravatar for swbarnes2
19 months ago by
swbarnes28.9k
United States
swbarnes28.9k wrote:

The first obvious thing to check; are you sure that your gff and your reference genome came from the same place?

Specifically, do the chromosome names match between your reference and your gff?

ADD COMMENTlink written 19 months ago by swbarnes28.9k

Yuhp, they are both of the same organism from the same assembly. Since the organism is a bacteria and there is only 1 chromosome the gene_id is the common identifier between files.

ADD REPLYlink written 19 months ago by kathryn.eckartt30

The gene id is in common between your gff and your reference?? Not the chromosome name?

ADD REPLYlink written 19 months ago by swbarnes28.9k

If I understand what you're asking, it's the gene_id that's common. The gff lists

Chromosome     name     feature     rank     position     strand     ID

and the reference genome for the single chromosome lists

>Assembly_species_complete_genome
TTAAGGGCCC

After aligned, sorted bam file has gene_ids and so does the gene model made from the gff with makeTxDbFromGFF.

ADD REPLYlink modified 19 months ago • written 19 months ago by kathryn.eckartt30

So your reference does not have gene_IDs? The entry under chromosome in your gff should be "Assembly_species_complete_genome". Is that what it is?

ADD REPLYlink written 19 months ago by swbarnes28.9k

Yuhp, that's correct- the reference doesn't have a gene-id. There isn't a Chromosome heading in the gff file, that is just what each line starts with.

ADD REPLYlink modified 19 months ago • written 19 months ago by kathryn.eckartt30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1588 users visited in the last hour