Question: see reads in .bam file, no counts after HTseq
gravatar for emilybowie2
5 days ago by
emilybowie20 wrote:

Hi! VERY, VERY (so please be nice to me!) new to the field here, but briefly: performed RNAseq (mouse tissue), got the sequences (fastq format), cleaned them up (trimgalore), aligned them (RNASTAR to mm10) - checked alignments on IGV and can see sufficient reads across all exons for GOI, however when I use HTseq to make count files, GOI has no counts? I've been going through forum after forum trying to figure out where something went wrong, but can't figure it out - any suggestions/insights for investigation into this issue?

rna-seq counts htseq • 94 views
ADD COMMENTlink written 5 days ago by emilybowie20

Please provide more detail:

  • copy / paste here the commands you used
  • genome and annotation versions were from same source and version?
  • could you provide a link to the GOI gene?

Read How To Ask Good Questions On Technical And Scientific Forums for guidance and reference for future questions.

ADD REPLYlink written 5 days ago by h.mon26k

Also is it just for 1 gene or is it for all genes you dont get any counts?

ADD REPLYlink written 5 days ago by kristoffer.vittingseerup1.8k

I appreciate the link for asking good questions - I'll elaborate the steps/commands:

  • Since I am a newbie, I uploaded all .fastq files to Galaxy - changed the file type to .fastqsanger before using Trim Galore. I used Trim Galore via Galaxy. My sequences were single-end, I chose automatic adaptor sequence trim, and left all other default things as default because...again, newbie. I ran FastQC on all samples, all samples looked good and had high quality scores.
    • I also aligned the reads using RNA STAR in Galaxy. I used the built-in Mus musculus mm10 genome. I selected to NOT count number of reads per gene since I did this part later. I set length of the genomic sequence around annotated junctions to 100 (per someone across the hall suggestion), I left all other default settings as default which I'm seeing is a trend, and possibly I should look further into these settings to make sure I'm not missing something there.
    • I downloaded these .bam/.bai files after alignment and checked on IGV viewer on the mm10 genome - this is where I can see my reads sufficiently for my GOI (Tautubulinkinase2: - using this as GOI as it is the center of my experimental question, so I wanted to make sure it was in my dataset at this point. It is a lowly expressed gene throughout the animal, except in the brain/testes (though this expression is still low even in these tissues comparatively speaking), and I submitted brain RNA).
    • Now for counts: initially I used FeatureCounts (again, on Galaxy...for someone like myself, this is a great tool!), selected the same Musmusculus mm10 genome from my history that I used for the annotation. Selected "stranded" for the input on strand specificity because that is what the core that gave me my samples did for library prep. Feature type: exon. Minimum bases of overlap: 1. Minimum mapping quality per read: 10. All other fancy options were default - seems the default is "no" for all additional things that featurecounts can do, again, I might need to revisit this. However the count output from featurecounts, did not have my GOI at this point, so I went to HTseq via that friend across the hall that mentioned it might be the better way to go about things. for HTseq, I followed the protcol I found on their website: I used my alignment files from RNA STAR, I downloaded this .gtf file for the genome: Mus_musculus.GRCm38.96.gtf.gz (this is essentially mm10 from ensembl), chose "union", and a minimum mapping quality per read of 10 (same as for featurecounts). Feature type: exon. Stranded: yes. These count files also do not have counts for my GOI. As kristoffer.vittingseerup asked below asked - I do have counts for 13000 other genes, just not my "control" gene in my control file (which this GOI has been manipulated in other samples that I want to compare to the it's kind of important I can see counts for it in my control).

Wow that's a novel.

ADD REPLYlink modified 5 days ago • written 5 days ago by emilybowie20

Are you using the same transcript file for alignment and HTseq? Also, I would recommend you to use FeatureCounts for transcript quantification.

ADD REPLYlink written 5 days ago by arup1.4k

The core functionality of HTSeq and featureCounts is identical so I would have to argue there is no need to switch. If you want more accurate quantification you would need to use pseudo-alligners such as Kallisto or Salmon instead. Their main advantages are: 1) they can handle multimappers (featureCounts and HTSeq cannot) which usually are a large fraction of your data you otherwise ignore. 2) They have sequence bias correction algorithms.

ADD REPLYlink written 5 days ago by kristoffer.vittingseerup1.8k

I agree that Kallisto/Salmon can provide more accurate transcript quantification. But I was recommending featureCounts as it takes less time and resources to run.

ADD REPLYlink written 4 days ago by arup1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1939 users visited in the last hour