Question: featureCounts/HTSeq-count problem with different annotation files
0
gravatar for monteiro.s.rita
3.9 years ago by
United Kingdom
monteiro.s.rita0 wrote:

I'm analysing RNA-Seq data ( I should add that my bioinformatic knowledge is quite limited) and after aligning using TopHat I want to count reads in order to perform differential analysis of expression. 

To achieve this I used both HTSeq-count and features counts and I have different GFF/GTF files that I use as annotation for the features. 

I'm going to show you the example of the annotation file for the same gene

One of the files look like this (lets called it gene.gtf):

scaffold_5    XENBASE_XT_7_1_GENE_MODEL_2    gene    72053579    72062232    .    +    .    Name=t;ID=4652532;GeneName=T,%20brachyury%20homolog;GeneFunction=T-box%20transcription%20factor;GeneSynonyms=ntl,brachyury,Xbrachyury,bra,X-bra,Xbra

And the other like this (exon.gtf):

scaffold_5    xenTro7_1_genome_v7_2    start_codon    72055079    72055081    0.000000    +    .    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t"; 

scaffold_5    xenTro7_1_genome_v7_2    CDS    72055079    72055284    0.000000    +    0    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t"; 

scaffold_5    xenTro7_1_genome_v7_2    exon    72053579    72055284    0.000000    +    .    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t"; 

scaffold_5    xenTro7_1_genome_v7_2    CDS    72057175    72057439    0.000000    +    1    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t"; 

scaffold_5    xenTro7_1_genome_v7_2    exon    72057175    72057439    0.000000    +    .    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t"; 

scaffold_5    xenTro7_1_genome_v7_2    CDS    72058300    72058434    0.000000    +    0    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t"; 

scaffold_5    xenTro7_1_genome_v7_2    exon    72058300    72058434    0.000000    +    .    gene_id "Xetrov72001125|Xetrov72001125|t"; transcript_id "Xetrov72001125|Xetrov72001125|t"; 

 

Basically the first file only has the whole gene feature while the other is divided by exons.

When I run HTSeq-count or featureCounts with the same bam file using either one or the other annotation file I get completely different results. With the gene.gtf I get around 5000 reads for this gene while I get 0 reads for any feature using the exon.gtf file. 

Am I missing something really obvious?

 

 

 

ADD COMMENTlink modified 18 months ago by Biostar ♦♦ 20 • written 3.9 years ago by monteiro.s.rita0

Can anyone help me ? I want to count my miRNA by using featureCounts. i have a .bam filee after mapped with the hg38 by bowtie2. I tried a lot but i couldn't. Thanks in advance

ADD REPLYlink written 2.2 years ago by k.kathirvel93200

I tried a lot but i couldn't

You need to specify what exactly you tried to get meaningful help. Also don't post new questions as ANSWERS on existing threads.

ADD REPLYlink written 2.2 years ago by genomax71k
0
gravatar for Kamil
3.9 years ago by
Kamil1.9k
Boston
Kamil1.9k wrote:

Could I ask you to post the commands that you are executing? It would be helpful to see all of the options you use with each command. My guess is that you may be missing some command line option.

On the other hand, you might consider the possibility that you have reads outside of exons. The exon intervals span 2715 nucleotides while the one gene interval spans 8653 nucleotides.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Kamil1.9k

This is the command I used with featureCounts

featureCounts -p -a ../gene.gtf -t gene -g Name -o counts_name.txt accepted_hits.bam
featureCounts -p -a ../exon.gtf -t exon -g gene_id -o counts_exon.txt accepted_hits.bam

and with HTSeq-count

htseq-count -s no -m intersection-nonempty -t gene -i Name -f bam accepted_hits_nsorted_paired_end.bam ../gene.gtf  > filename.counts

htseq-count -s no -m intersection-nonempty -t exon -i gene_id -f bam accepted_hits_nsorted_paired_end.bam ../gene.gtf  > filename.counts

 

I did not include the all the exon present in the second file, otherwise it would be too long. But the first coordinate of the first exon correspond to the first coordinate of the gene and the last coordinate of the last exon corresponds to the last coordinate of the gene (when comparing the two files).

ADD REPLYlink written 3.9 years ago by monteiro.s.rita0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2090 users visited in the last hour