Question: How to get gene counts with HTseq
gravatar for Michel Edwar
7 days ago by
Michel Edwar60
Michel Edwar60 wrote:

Ihave two conditions to compare. I am using HISAT2, sam tools and HTSEQ pipeline. However, I end up getting zero values for well known active genes in both WT and treated. I have viewed the output of the .bam files using IGV and it shows reads for these particular genes. Here is my code

    hisat2-build -p 4 mus-genome.fa  mkr
    # run the following command to produce sam file and convert to bam file sort 
    hisat2 -x ${hisat2Ref} \
           -U RORgt24h_S9_L001_R1_001.fastq |
           samtools view -bh - |
           samtools sort - > mkr1_file.bam
    samtools index mkr1_file.bam
   # finally i use HTseq-count
    htseq-count -f bam mkr1_file.bam mkr-mus-genome.gtf > 3adera1.txt

I wonder if you can help me pinpoint which step I am missing here. thank you

rna-seq counts htseq reads • 85 views
ADD COMMENTlink written 7 days ago by Michel Edwar60

Did you check if your sequences are named the same in your gff file as in your fasta file? (that's a typical cause of these kind of issues)

ADD REPLYlink written 7 days ago by lieven.sterck4.2k

yes i checked that's what i get mus-genome.fa


for mkr-mus-genome.gtf, i get

  1       havana  exon    3073253 3074322 .       +       .       exon_id "ENSMUSE00001343744"; exon_number "1"; exon_version "1"; gene_biotype "TEC"; gene_id "ENSMUSG00000102693"; gene_name "4933401J01Rik"; gene_source "havana"; gene_version "1"; havana_gene "OTTMUSG00000049935"; havana_gene_version "1"; havana_transcript "OTTMUST00000127109"; havana_transcript_version "1"; tag "basic"; transcript_biotype "TEC"; transcript_id "ENSMUST00000193812"; transcript_name "4933401J01Rik-001"; transcript_source "havana"; transcript_support_level "NA"; transcript_version "1"; tss_id "TSS81250";

so what should i change?

ADD REPLYlink modified 4 days ago • written 4 days ago by Michel Edwar60

Is it that all counts are zero (supporting the suggestion of lieven.sterck ) or are there some values > 0?

ADD REPLYlink modified 6 days ago • written 6 days ago by ATpoint14k

Not all counts are zero. i checked some of the known genes and they are zeros, but the volcano plot looks normal

ADD REPLYlink written 4 days ago by Michel Edwar60

right, since not all counts are zero the cmdline is technically fine it seems, so it will be likely something else then.

You said there are reads aligned in those regions when inspecting it in IGV. do they 100% coincide with the given gene-structures in your gtf file (== are all intron/exon boundaries respected)?

ADD REPLYlink written 4 days ago by lieven.sterck4.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2189 users visited in the last hour