Question: use gff3 file with featureCount
4.9 years ago
bakeronit wrote:

I use tophat do the mapping and get .bam file. I try to use featureCount. Here are some question about my data:

1.I use pair-end reads also single-end reads in tophat, so the mapping result contain both.Which arguments I should use? annotation file is gff3 but I check the parameter in featureCount such as "-g"(GTF.attrType),I find almost everyone use -g gene_id. And gff3 use "ID=" and "parent=". I can't get any result with my gff3 file so I convert it to gtf. Now I get read count, is it reliable? or how to deal with gff3?

here is my command usage:

featureCounts  -a my.gtf -t exon -g gene_id -o counts.txt accepted_hits.bam




written 4.9 years ago by bakeronit0

Hi bakeronit,

For your second point, as long as the format conversion is reliable, the information in your annotation files is the same. Your read count is correct. There would be a problem if the GFF3 and GTF file had had a different information. A format good format conversion does not modify the information in a file.

I have never been in the situation described in your first point. I'll let someone else answer.

written 4.9 years ago by Thibault D.690
4.7 years ago
UNC Chapel Hill
Joseph Pearson wrote:

Hi bakeronit-

Regarding your first point, on the TopHat manual page:

TopHat can align reads that are up to 1024 bp long, and it handles paired-end reads and unpaired reads at once, but we do not recommend mixing different types of reads in the same TopHat run.

After this warning, they explain how to combine your single-end and paired-end reads, in a manner.

To your second point, the Cufflinks utility gffread can convert between gff and gtf formats. I believe I've only gone gtf-to-gff, but I bet it works well in the other direction. The additional advantage is that it removes extraneous lines (extraneous to cufflinks), so you will end up with a much more compact annotation file.

written 4.7 years ago by Joseph Pearson450
