Question: GAPDH missing from htseq-counts file
0
gravatar for melia_g
11 months ago by
melia_g0
UK
melia_g0 wrote:

I have some RNAseq data from rat neurons which was processed using STAR and HTseq. I noticed earlier today that for all samples, there were fewer than 20 copies of GAPDH, which should be highly expressed, in the HTSeq count files. There may be other high count genes missing, but I haven't been able to find any other anomalies so far and there are genes such as Map2, Calm1/2 and Nrgn with counts in the tens of thousands range.

I've been trying to trace back the source of this error, starting with one of the samples as a test case, and found in the STAR BAM file (converted from SAM) that there were ~15000 alignments to the GAPDH co-ordinates, trying two slightly different sets of co-ordinates for GAPDH in the rat genome from Ensembl and NCBI.

samtools view s10.bam 4:157676336-157680322 | wc -l

Yet for the same sample, the count of uniquely aligned reads in HTSeq is just 9.

python -m HTSeq.scripts.count -s no -a 10 ~/Star_cleanreads/S10.sam Rattus_norvegicus.Rnor_6.0.89.gtf > s10.count grep ENSRNOG00000018630 s10.count

I'm assuming that the issue is the GAPDH transcripts are for some reason aligning to multiple locations and have tried to confirm this by running HTSeq with the "--nonunique all" option, but get the error that this option doesn't exist. I can't find the version number for HTSeq but assume I must be running an old version. I've also had a look at GAPDH in the gtf file and can't see an obvious problem there.

Has anyone else run into this issue and/or can advise how to proceed?

Many thanks

nonunique gapdh htseq • 547 views
ADD COMMENTlink written 11 months ago by melia_g0

Are you sure that GAPDH is expressed in high quantities in the tissue that you're researching?

Far from what people believe, the traditional housekeeping genes (including GAPDH) don't in fact show ubiquitous and equal expression patterns across tissues - far from it.

Kevin

ADD REPLYlink written 11 months ago by Kevin Blighe30k

Hi, thanks for taking the time to respond. I did have a quick look at the literature first and yes GAPDH does seem to be expressed in CA1 neurons. Regardless, there are many alignments to the GAPDH co-ordinates in the SAM file which is what makes me think there's an issue with the counting.

In the history I found a post about a similar issue with Cufflinks, apparently due to the max number of fragments per locus parameter and wondered if it could be something similar, but I've had no luck so far ( https://biostar.usegalaxy.org/p/12372/ ).

ADD REPLYlink modified 11 months ago by Kevin Blighe30k • written 11 months ago by melia_g0

Thanks for getting back. That could certainly be the cause, and it would be exacerbated by the fact that GAPDH is a relatively small gene. It also sits in very close proximity to other genes, which may prove probematic for HTSeq, I read:

We find that if one uses HTSeq counts as input, the maximum achievable sensitivity is lower than for the other methods. This is due to the fact that for overlapping gene symbols HTSeq does not count the reads and for those genes differential expression cannot be positive.

source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3879183/

I can neither confirm that that is the issue though.

Might I recommend an alignment-free RNA-seq quantification program like Kallisto? The reference is the genomic transcriptomic sequences for each transcript (FASTA), over which abundance is calculated directly from the FASTQ/A reads. You may see improvement in the problem that way.

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe30k

Thank you so much! I'm getting a compilation error with Kallisto, but quantifying with Salmon recovers GAPDH. For the sample above: ENSRNOT00000050443.4, TPM: 2302.93; Counts: 10686.1

ADD REPLYlink written 11 months ago by melia_g0

Salmon is just as good! Great that you could recover the expected counts.

ADD REPLYlink written 11 months ago by Kevin Blighe30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1734 users visited in the last hour