Entering edit mode
3.9 years ago
Genomics
▴
20
I have feature counts with zero for most of genes in most of the samples. I am worrying as Alignment statistics shows 90% (12M reads aligned) so if not exon then where it is going and aligning.
Not all counts are zero, but for some specific genes i would expect counts to be higher in samples Following command i have used
featureCounts -p -t exon -g gene_id -a Refseq.gtf -o output.txt sample.bam -M -O
For Example this is feature count of CXCL2 gene in different samples
CXCL2 1 0 0 0 0 0 0 0
Have you visually verified alignments using IGV? You may have uneven coverage and/or over-amplified libraries. You are even counting multi-mapping reads (which is odd) and still getting zeros?
I have visualized in IGV , it shows the same as what i get in feature Count.
As in there are no reads mapping to that gene? If that is the case why are you surprised by featureCounts result? Perhaps there is something wrong with the experiment itself?
Agreed to this point. But why am I getting aligned reads (12-20 M) for all samples. I want to know where are they going and aligning.
What strandedness was your library prep kit?
What strandedness flag did you use? (-s 0, 1, 2)
Do those match?
Yes it is matching my library is F/R strand and I've used following in Hisat --fr -1
featureCounts
hasisStrandSpecific
flag also, default is0
indicating unstranded.See this helpful thread.
I did this. Doesn't help
What's the experimental design here? What material are samples from? Did you do the labwork, or can you get RNA integrity/RINs? What were duplicate rates like? What are levels of CXCL1, 3, 5 like?
My samples are from swab, I've not done any lab work, just analyzing the data.But I can get RNA integrity. All CXCls have no counts in samples.
I'd really want to know how long after swabs were collected was RNA extracted, what was the swab stored in (RNAlater, buffer from extraction kit)?
What library kit was used? Previously might have said to look into rRNA contamination if an rRNA-depletion method was used, but nearly always polyA-selection now.
My guess would be low complexity library based on sample degradation. There would be a lot of duplicates (50%+) if so.
If you want to know where reads align and were counted, sort your featureCounts output file on the sum of count columns
I would also test for exonic reads using
samtools view
with a relevant bed file of gene regions using-L
and-M
.I've removed rRNA by mapping them to rRNA databse Also how about doing a Denovo assembly and checking where the reads are aligning?
Could do de novo, presume this is human if swabs (buccal?), if so I'd be more inclined to see what the issue was with the data.
What level of rRNA reads were in the samples? Was it polyA selection kit or rRNA depletion?
Still interested in the duplication rate, think ruling out low complexity library is useful. That does assume some genes had relatively 'normal' levels of expression(?)