RNA Seq Analysis:Feature Counts are zero
0
0
Entering edit mode
3.9 years ago
Genomics ▴ 20

I have feature counts with zero for most of genes in most of the samples. I am worrying as Alignment statistics shows 90% (12M reads aligned) so if not exon then where it is going and aligning.

Not all counts are zero, but for some specific genes i would expect counts to be higher in samples Following command i have used

featureCounts -p -t exon -g gene_id -a Refseq.gtf -o output.txt sample.bam -M -O

For Example this is feature count of CXCL2 gene in different samples

CXCL2   1   0   0   0   0   0   0   0
RNA-Seq feature counts • 2.3k views
ADD COMMENT
1
Entering edit mode

Have you visually verified alignments using IGV? You may have uneven coverage and/or over-amplified libraries. You are even counting multi-mapping reads (which is odd) and still getting zeros?

ADD REPLY
0
Entering edit mode

I have visualized in IGV , it shows the same as what i get in feature Count.

ADD REPLY
2
Entering edit mode

As in there are no reads mapping to that gene? If that is the case why are you surprised by featureCounts result? Perhaps there is something wrong with the experiment itself?

ADD REPLY
0
Entering edit mode

Agreed to this point. But why am I getting aligned reads (12-20 M) for all samples. I want to know where are they going and aligning.

ADD REPLY
0
Entering edit mode

What strandedness was your library prep kit?

What strandedness flag did you use? (-s 0, 1, 2)

Do those match?

ADD REPLY
0
Entering edit mode

Yes it is matching my library is F/R strand and I've used following in Hisat --fr -1

ADD REPLY
0
Entering edit mode

featureCounts has isStrandSpecific flag also, default is 0 indicating unstranded.

See this helpful thread.

ADD REPLY
0
Entering edit mode

I did this. Doesn't help

ADD REPLY
0
Entering edit mode

What's the experimental design here? What material are samples from? Did you do the labwork, or can you get RNA integrity/RINs? What were duplicate rates like? What are levels of CXCL1, 3, 5 like?

ADD REPLY
0
Entering edit mode

My samples are from swab, I've not done any lab work, just analyzing the data.But I can get RNA integrity. All CXCls have no counts in samples.

ADD REPLY
0
Entering edit mode

I'd really want to know how long after swabs were collected was RNA extracted, what was the swab stored in (RNAlater, buffer from extraction kit)?

What library kit was used? Previously might have said to look into rRNA contamination if an rRNA-depletion method was used, but nearly always polyA-selection now.

My guess would be low complexity library based on sample degradation. There would be a lot of duplicates (50%+) if so.

If you want to know where reads align and were counted, sort your featureCounts output file on the sum of count columns

awk '{print $0, $1+$2+$3+$4+$5+$6+$7+$8}' output.txt | sort -k10rn

I would also test for exonic reads using samtools view with a relevant bed file of gene regions using -L and -M.

ADD REPLY
0
Entering edit mode

I've removed rRNA by mapping them to rRNA databse Also how about doing a Denovo assembly and checking where the reads are aligning?

ADD REPLY
0
Entering edit mode

Could do de novo, presume this is human if swabs (buccal?), if so I'd be more inclined to see what the issue was with the data.

What level of rRNA reads were in the samples? Was it polyA selection kit or rRNA depletion?

Still interested in the duplication rate, think ruling out low complexity library is useful. That does assume some genes had relatively 'normal' levels of expression(?)

ADD REPLY

Login before adding your answer.

Traffic: 1548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6