How to get gene counts with HTseq
0
0
Entering edit mode
5.1 years ago
Michel Edwar ▴ 70

Ihave two conditions to compare. I am using HISAT2, sam tools and HTSEQ pipeline. However, I end up getting zero values for well known active genes in both WT and treated. I have viewed the output of the .bam files using IGV and it shows reads for these particular genes. Here is my code

    hisat2-build -p 4 mus-genome.fa  mkr
    # run the following command to produce sam file and convert to bam file sort 
    hisat2Ref=/mnt/c/Users/Admin/Desktop/3adera-rna/mkr-index/mkr1
    hisat2 -x ${hisat2Ref} \
           -U RORgt24h_S9_L001_R1_001.fastq |
           samtools view -bh - |
           samtools sort - > mkr1_file.bam
    samtools index mkr1_file.bam
   # finally i use HTseq-count
    htseq-count -f bam mkr1_file.bam mkr-mus-genome.gtf > 3adera1.txt

I wonder if you can help me pinpoint which step I am missing here. thank you

RNA-Seq counts reads htseq • 2.6k views
ADD COMMENT
0
Entering edit mode

Did you check if your sequences are named the same in your gff file as in your fasta file? (that's a typical cause of these kind of issues)

ADD REPLY
0
Entering edit mode

yes i checked that's what i get mus-genome.fa

>10
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

for mkr-mus-genome.gtf, i get

  1       havana  exon    3073253 3074322 .       +       .       exon_id "ENSMUSE00001343744"; exon_number "1"; exon_version "1"; gene_biotype "TEC"; gene_id "ENSMUSG00000102693"; gene_name "4933401J01Rik"; gene_source "havana"; gene_version "1"; havana_gene "OTTMUSG00000049935"; havana_gene_version "1"; havana_transcript "OTTMUST00000127109"; havana_transcript_version "1"; tag "basic"; transcript_biotype "TEC"; transcript_id "ENSMUST00000193812"; transcript_name "4933401J01Rik-001"; transcript_source "havana"; transcript_support_level "NA"; transcript_version "1"; tss_id "TSS81250";

so what should i change?

ADD REPLY
0
Entering edit mode

Is it that all counts are zero (supporting the suggestion of lieven.sterck ) or are there some values > 0?

ADD REPLY
0
Entering edit mode

Not all counts are zero. i checked some of the known genes and they are zeros, but the volcano plot looks normal

ADD REPLY
0
Entering edit mode

right, since not all counts are zero the cmdline is technically fine it seems, so it will be likely something else then.

You said there are reads aligned in those regions when inspecting it in IGV. do they 100% coincide with the given gene-structures in your gtf file (== are all intron/exon boundaries respected)?

ADD REPLY

Login before adding your answer.

Traffic: 1556 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6