Question: FeatureCounts, zero counts from Hisat2 alignment
0
gravatar for af1065
4 months ago by
af10650
af10650 wrote:

I am seeking a way to correct the formats for inputs to FeatureCounts.

I have sourced data from a resource for sweet potato genome annotations, and aligned my RNA-seq data to the available reference genome. However, it seems FeatureCounts cannot report on the SAM files I input.

The reference genome FASTA file does not have chromosomes clearly in the headers. I believe this to be the reason for FeatureCounts not working, however, I want to be sure before I go scripting in different directions to correct the files I have.

To be clear, I am providing the genome annotation file along with each of my aligned samples.

Example reference genome sequence header:

>itf11g12620.t1

The above gene identifier shows that it is on chromosome 11.

Example GFF:

##gff-version 3

Chr11 gt4sp_v3 mRNA 1467 2621 . + . ID=itf01g00010.t1;Name=itf01g00010.t1;Parent=itf01g00010

I am not getting errors from FeatureCounts, but it is possible that Chr11 should be in the FASTA header. Also, I do convert the GFF3 file to GTF so that FeatureCounts has a GTF as input.

rna-seq • 226 views
ADD COMMENTlink modified 8 weeks ago by Biostar ♦♦ 20 • written 4 months ago by af10650

The reference genome FASTA file does not have chromosomes clearly in the headers. I believe this to be the reason for FeatureCounts not working,

That is correct. Chromosome identifiers i.e. Chr11 need to match in your reference and the annotation.

ADD REPLYlink written 4 months ago by GenoMax92k

I have corrected my headers, yet still I am not getting features to be counted.

sequence header: >itb13g17640.t1|chr13

GTF: chr13 gt4sp_v3 CDS 26079446 26082424 . - 0 transcript_id "itb13g19070.t1"; gene_id "itb13g19070";

Do you have another suggestion for recourse?

ADD REPLYlink written 3 months ago by af10650

This entire string >itb13g17640.t1|chr13 needs to match first column in your GTF. If possible get rid of >itb13g17640.t1 and leave chr13 behind.

ADD REPLYlink modified 3 months ago • written 3 months ago by GenoMax92k

Thank you! Seems counter-intuitive that the sequences do not need gene ids.

ADD REPLYlink written 3 months ago by af10650

Gene ID's are provided by ID=itf01g00010.t1 in last field in your GTF. Be sure to use -g ID with featureCounts.

ADD REPLYlink modified 3 months ago • written 3 months ago by GenoMax92k

Just to make sure, is your reference sequence really the genome ? Or the transcriptome/cDNA ? Having headers like >itf11g12620.t1 makes it looks like it is a transcriptome sequence, for which renaming the header as chr13 does not make sense indeed.

In addition, if you aligned on the transcriptome, you do not need to use featureCounts because reads are already assigned to transcripts. Instead, you can simply use samtools idxstats to obtain the number of reads by sequence (after perhaps filtering against unwanted – multimappers, chimeric, etc... – reads).

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Carlo Yague5.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2156 users visited in the last hour