Question: How to get gene counts with HTseq
0
gravatar for Michel Edwar
4 months ago by
Michel Edwar70
Sweden
Michel Edwar70 wrote:

Ihave two conditions to compare. I am using HISAT2, sam tools and HTSEQ pipeline. However, I end up getting zero values for well known active genes in both WT and treated. I have viewed the output of the .bam files using IGV and it shows reads for these particular genes. Here is my code

    hisat2-build -p 4 mus-genome.fa  mkr
    # run the following command to produce sam file and convert to bam file sort 
    hisat2Ref=/mnt/c/Users/Admin/Desktop/3adera-rna/mkr-index/mkr1
    hisat2 -x ${hisat2Ref} \
           -U RORgt24h_S9_L001_R1_001.fastq |
           samtools view -bh - |
           samtools sort - > mkr1_file.bam
    samtools index mkr1_file.bam
   # finally i use HTseq-count
    htseq-count -f bam mkr1_file.bam mkr-mus-genome.gtf > 3adera1.txt

I wonder if you can help me pinpoint which step I am missing here. thank you

rna-seq counts htseq reads • 296 views
ADD COMMENTlink written 4 months ago by Michel Edwar70

Did you check if your sequences are named the same in your gff file as in your fasta file? (that's a typical cause of these kind of issues)

ADD REPLYlink written 4 months ago by lieven.sterck5.5k

yes i checked that's what i get mus-genome.fa

>10
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

for mkr-mus-genome.gtf, i get

  1       havana  exon    3073253 3074322 .       +       .       exon_id "ENSMUSE00001343744"; exon_number "1"; exon_version "1"; gene_biotype "TEC"; gene_id "ENSMUSG00000102693"; gene_name "4933401J01Rik"; gene_source "havana"; gene_version "1"; havana_gene "OTTMUSG00000049935"; havana_gene_version "1"; havana_transcript "OTTMUST00000127109"; havana_transcript_version "1"; tag "basic"; transcript_biotype "TEC"; transcript_id "ENSMUST00000193812"; transcript_name "4933401J01Rik-001"; transcript_source "havana"; transcript_support_level "NA"; transcript_version "1"; tss_id "TSS81250";

so what should i change?

ADD REPLYlink modified 4 months ago • written 4 months ago by Michel Edwar70

Is it that all counts are zero (supporting the suggestion of lieven.sterck ) or are there some values > 0?

ADD REPLYlink modified 4 months ago • written 4 months ago by ATpoint19k

Not all counts are zero. i checked some of the known genes and they are zeros, but the volcano plot looks normal

ADD REPLYlink written 4 months ago by Michel Edwar70

right, since not all counts are zero the cmdline is technically fine it seems, so it will be likely something else then.

You said there are reads aligned in those regions when inspecting it in IGV. do they 100% coincide with the given gene-structures in your gtf file (== are all intron/exon boundaries respected)?

ADD REPLYlink written 4 months ago by lieven.sterck5.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1580 users visited in the last hour