Question: No reads assigned to bacterial genome with featureCounts (Rsubread)
0
gravatar for Mathias_H
14 months ago by
Mathias_H20
Austria
Mathias_H20 wrote:

Hi everyone, I am programming an RNA-Seq Pipeline in R to process raw read data. I need to use multiple reference genomes from different organisms. For the mapping step I use STAR and everything worked great until the counting step with featureCounts (Rsubread). For one bacterial genome (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/009/445/GCA_000009445.1_ASM944v1) featureCounts is not able to assign any reads.

When I look at the count-matrix I see that there are no gene names, so I guess the problem is that featureCounts is not able to read the annotation file correctly.

I have tried to convert the .gff annotation file into a .gtf file with gffread and to use the .gtf file for counting, but I still do not get any assigned reads. (Also the .gtf file is much smaller (795 kB) than the .gff file (2.2 MB) and I don't know whether I lose important information here.)

As already mentioned in this post the problem may be the origin (NCBI) of the genome files and the ability of featureCounts to process them, but I do not use the build-in annotation ("mm10").

What do you guys think? Do I have to use a different program for the counting step for bacterial genomes?

tmp2 <- featureCounts(filesToCount, annot.ext = annGenbac, isGTFAnnotationFile = TRUE, genome = refGenbac, isPairedEnd = TRUE)

with .gff file:

Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id' 
The attributes included in your GTF annotation are 'ID=id1;Parent=rna0;Note=ileT_1%2C len: 74 nt. First of two copies equivalent to ileT%2C len: 74 nt%2C from Mycobacterium tuberculosis strain H37Rv%2C (100.0%25 identity in 74 nt overlap). tRNA-Ile%2C anticodon gat;anticodon=(pos:10921..10923);gbkey=tRNA;gene=tRNA-Ile;product=tRNA-Ile' 

||    Features : 52                                                           ||
||    Meta-features : 1                                                       ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Loading FASTA contigs : /data/mathias/dualRNA-Seq/M_tuber/Genomes/Myco ... ||
||    1 contigs were loaded                                                   ||
||                                                                            ||
|| Process BAM file /data/mathias/dualRNA-Seq/M_tuber/Output/Read_filteri ... ||
||    Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 160572                                                    ||
||    Successfully assigned reads : 0 (0.0%)                                  ||
||    Running time : 0.01 minutes

with .gtf file:

Features : 52                                                           ||
||    Meta-features : 52                                                      ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Loading FASTA contigs : /data/mathias/dualRNA-Seq/M_tuber/Genomes/Myco ... ||
||    1 contigs were loaded                                                   ||
||                                                                            ||
|| Process BAM file /data/mathias/dualRNA-Seq/M_tuber/Output/Read_filteri ... ||
||    Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 160572                                                    ||
||    Successfully assigned reads : 0 (0.0%)                                  ||
||    Running time : 0.00 minutes
rna-seq featurecounts R • 512 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by Mathias_H20
0
gravatar for genomax
14 months ago by
genomax92k
United States
genomax92k wrote:

Looks like your gene names are in Name=tRNA-Ala or gene= field. Have you tried to specify those fields with your -g option (you will need to check what the equivalent is for R command since that is what you are using) with featureCounts? Also make sure your reference and thus your alignment file contains AM408590.1 as the chromosome name.

ADD COMMENTlink modified 14 months ago • written 14 months ago by genomax92k

Thank you for your answer! Yes, I have tried to change the -g option (GTF.featureType in R) to "gene", "Name" and "ID". I still do not get any assigned reads. The headers of my alignment files (.bam) look like that:

    @HD VN:1.6  SO:coordinate
@SQ SN:ENA|AM408590|AM408590.1  LN:4374522
@PG ID:STAR PN:STAR VN:2.7.0e

Do you mean I have to change "ENA|AM408590|AM408590.1" to "AM408590.1"?

ADD REPLYlink written 14 months ago by Mathias_H20

That is correct. That identifier has to match exactly the one in annotation file. You can edit the header of the SAM file (if that is what you are using to count).

ADD REPLYlink written 14 months ago by genomax92k

Yes, that was the problem!! With changed BAM file headers...

@HD VN:1.6  SO:coordinate
@SQ SN:AM408590.1   LN:4374522
@PG ID:STAR PN:STAR VN:2.7.0e

...I finally got some assigned reads.

Thank you very much for your help!

ADD REPLYlink modified 14 months ago • written 14 months ago by Mathias_H20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1860 users visited in the last hour