featureCounts assignment low when mapping to "exon"
1
0
Entering edit mode
3.7 years ago
gt ▴ 30

Newbie here. I am trying to use featureCounts to assign reads to features (exons). My reference is E. Coli. I downloaded the gtf file from here: https://www.ncbi.nlm.nih.gov/genome/167?genome_assembly_id=161521 I am noticing that my % assigned is very low after running featureCounts. However, I also noticed in the gtf file there are only ~300 exons listed. Is this correct for E. coli? I can't find any resources online to help me with this. Here is the featureCounts command I run:

featureCounts -p -t exon -g gene -T 16 -s 2 -a GCF_000005845.2_ASM584v2_genomic.gtf -o counts.txt input_file.bam

Maybe I am not downloading the correct gtf file but I need this exact strain.

RNA-seq E.coli featureCounts • 2.1k views
ADD COMMENT
2
Entering edit mode
3.7 years ago

There is a lot of inconsistency when it comes to biological data. Different groups may choose to annotate their organisms differently.

In this case, only some regions of non-coding RNAs are annotated as exons, and for genes, only the coding sequences are properly annotated.

in short, use CDS instead of exon when counting with featurecounts

Here is how to verify the statements. Using the bio package (see: https://www.bioinfo.help/) one can quickly visualize the following:

# Fetch the data from NCBI.
bio fetch NC_000913

# Get all features in GFF format.
bio convert NC_000913 --gff > all.gff

# Get exons only in GFF format.
bio convert NC_000913 --gff --type exon > exon.gff

# Get the reference file in FASTA format.
bio convert NC_000913 --fasta > ref.fa

now visualize all three in IGV to get:

enter image description here

ADD COMMENT
0
Entering edit mode

Wow this helps a lot thanks. I guess one more question I have is that I am doing a differential expression analysis and I have read that you should not specify "CDS" when doing DE. Is this true?

ADD REPLY
0
Entering edit mode

technically true, but for procaryotes in general and in this particular case where exons are not even annotated, and where gene and CDS all the share the same coordinates it makes no difference.

You could also use gene instead of CDS or exon.

ADD REPLY
0
Entering edit mode

Okay great, thanks very much for your help!

ADD REPLY

Login before adding your answer.

Traffic: 2236 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6