Question

featureCount not processing reads

0

Entering edit mode

10.6 years ago

amoltej ▴ 100

Dear All,

I am trying to use featurecounts for the first time. I am following exactly as mentioned in the Rsubread manual.

featureCounts(files='Aligned.nsortedByCoord.out.bam', annot='cuffcmp.combined.gtf', isGTFAnnotationFile=TRUE,isPairedEnd=TRUE)

after that the outcome looks like this

Processing Bcyt/Aligned.sortedByCoord.out.bam ...
Warning: the feature on the 4366-th line has zero coordinate or zero lengths
Warning: the feature on the 26925-th line has zero coordinate or zero lengths
Warning: the feature on the 36138-th line has zero coordinate or zero lengths
Warning: the feature on the 38896-th line has zero coordinate or zero lengths
Warning: the feature on the 49896-th line has zero coordinate or zero lengths
Warning: the feature on the 51284-th line has zero coordinate or zero lengths
Warning: the feature on the 54658-th line has zero coordinate or zero lengths
Warning: the feature on the 55995-th line has zero coordinate or zero lengths
There are 84237 features loaded from the annotation file.
The 84237 features are sorted.
Number of chromosomes included in the annotation is 4940
The 0-th thread processed 0 reads
Number of fragments mapped to the features is: 0
Time cost = 1.6 seconds

It seems nothing is happening because there are zero processed reads.

I am not sure what has went wrong? also please tell me more about the featurecount out come? how to read the file??

Thank you in advance.

Amol

featureCounts RNA-Seq Rsubread • 4.9k views

ADD COMMENT • link updated 3.8 years ago by Ram 45k • written 10.6 years ago by amoltej ▴ 100

0

Entering edit mode

Hi ALL, i also came across this similar problem with below erros msg-

ror: the feature on the 21091-th line has zero coordinate or zero lengths No counts were generated.

my annotation is, Annotation : GCF_000009705.1_ASM970v1_genomic.gtf.gz (GTF)

apperently the 21091th line has start position higher vs end position. And in this assembly there are more places where start is higher than end. its a prokayotic genome so Tom's point is valid but is there a way to get pass through it without loosing any information ?

ADD REPLY • link 4.3 years ago by Rajesh ▴ 10

score 0 · Answer 1 · 2014-12-13

0

Entering edit mode

10.6 years ago

GouthamAtla 12k

from the file name (cuffcmp.combined.gtf), I could see that you are using cufflinks generated gtf file.

Will the subreads package works with cufflinks annotation file ? I would suggest first try to use the standard annotations ( if available for your genome) using getInBuiltAnnotation and see if it works.

But I would like to know why you are using the cufflinks gtf file to count the reads ? whats your goal ?

ADD COMMENT • link 10.6 years ago by GouthamAtla 12k

0

Entering edit mode

Hey Geek_y thanks The organism that I am working on does not have standard annotated genome. I have made my own gtf file using scipio program and the closely related organism's protein sequences. When I visualized my BAM file, genome scaffold file and gtf file in seqmonk program I found out that there are so many reads which does not have annotation in my scipio gtf file. That's why I generated combined gtf file using cuffcompare which has all the annotations generated by cufflinks. I have just tried scipio gtf file and error is still same with more warnings.

ADD REPLY • link 10.6 years ago by amoltej ▴ 100

score 0 · Answer 2 · 2020-02-27

Should anybody still stumble upon this thread (like i just did):

In my case the errors were caused by features where the start position was greater than the end position in the .GTF file. I assume this happens because i'm looking at circular bacterial chromosomes and the assemblies start somewhere in the middle of a gene, so the last annotated feature loops back to the beginning of the assembly. It looks like featureCounts can't handle that situation. I was googling for an easy solution, but haven't found one yet. For now i just cut the last feature from the annotation and hope i don't loose interesting data.