Question: see reads in .bam file, no counts after HTseq
0
gravatar for emilybowie2
5 days ago by
emilybowie20 wrote:

Hi! VERY, VERY (so please be nice to me!) new to the field here, but briefly: performed RNAseq (mouse tissue), got the sequences (fastq format), cleaned them up (trimgalore), aligned them (RNASTAR to mm10) - checked alignments on IGV and can see sufficient reads across all exons for GOI, however when I use HTseq to make count files, GOI has no counts? I've been going through forum after forum trying to figure out where something went wrong, but can't figure it out - any suggestions/insights for investigation into this issue?

rna-seq counts htseq • 94 views
ADD COMMENTlink written 5 days ago by emilybowie20
2

Please provide more detail:

  • copy / paste here the commands you used
  • genome and annotation versions were from same source and version?
  • could you provide a link to the GOI gene?

Read How To Ask Good Questions On Technical And Scientific Forums for guidance and reference for future questions.

ADD REPLYlink written 5 days ago by h.mon26k
1

Also is it just for 1 gene or is it for all genes you dont get any counts?

ADD REPLYlink written 5 days ago by kristoffer.vittingseerup1.8k

I appreciate the link for asking good questions - I'll elaborate the steps/commands:

  • Since I am a newbie, I uploaded all .fastq files to Galaxy - changed the file type to .fastqsanger before using Trim Galore. I used Trim Galore via Galaxy. My sequences were single-end, I chose automatic adaptor sequence trim, and left all other default things as default because...again, newbie. I ran FastQC on all samples, all samples looked good and had high quality scores.
    • I also aligned the reads using RNA STAR in Galaxy. I used the built-in Mus musculus mm10 genome. I selected to NOT count number of reads per gene since I did this part later. I set length of the genomic sequence around annotated junctions to 100 (per someone across the hall suggestion), I left all other default settings as default which I'm seeing is a trend, and possibly I should look further into these settings to make sure I'm not missing something there.
    • I downloaded these .bam/.bai files after alignment and checked on IGV viewer on the mm10 genome - this is where I can see my reads sufficiently for my GOI (Tautubulinkinase2: https://www.ncbi.nlm.nih.gov/gene/140810 - using this as GOI as it is the center of my experimental question, so I wanted to make sure it was in my dataset at this point. It is a lowly expressed gene throughout the animal, except in the brain/testes (though this expression is still low even in these tissues comparatively speaking), and I submitted brain RNA).
    • Now for counts: initially I used FeatureCounts (again, on Galaxy...for someone like myself, this is a great tool!), selected the same Musmusculus mm10 genome from my history that I used for the annotation. Selected "stranded" for the input on strand specificity because that is what the core that gave me my samples did for library prep. Feature type: exon. Minimum bases of overlap: 1. Minimum mapping quality per read: 10. All other fancy options were default - seems the default is "no" for all additional things that featurecounts can do, again, I might need to revisit this. However the count output from featurecounts, did not have my GOI at this point, so I went to HTseq via that friend across the hall that mentioned it might be the better way to go about things. for HTseq, I followed the protcol I found on their website: https://htseq.readthedocs.io/en/release_0.11.1/count.html. I used my alignment files from RNA STAR, I downloaded this .gtf file for the genome: Mus_musculus.GRCm38.96.gtf.gz (this is essentially mm10 from ensembl), chose "union", and a minimum mapping quality per read of 10 (same as for featurecounts). Feature type: exon. Stranded: yes. These count files also do not have counts for my GOI. As kristoffer.vittingseerup asked below asked - I do have counts for 13000 other genes, just not my "control" gene in my control file (which this GOI has been manipulated in other samples that I want to compare to the control...so it's kind of important I can see counts for it in my control).

Wow that's a novel.

ADD REPLYlink modified 5 days ago • written 5 days ago by emilybowie20
1

Are you using the same transcript file for alignment and HTseq? Also, I would recommend you to use FeatureCounts for transcript quantification.

ADD REPLYlink written 5 days ago by arup1.4k
1

The core functionality of HTSeq and featureCounts is identical so I would have to argue there is no need to switch. If you want more accurate quantification you would need to use pseudo-alligners such as Kallisto or Salmon instead. Their main advantages are: 1) they can handle multimappers (featureCounts and HTSeq cannot) which usually are a large fraction of your data you otherwise ignore. 2) They have sequence bias correction algorithms.

ADD REPLYlink written 5 days ago by kristoffer.vittingseerup1.8k

I agree that Kallisto/Salmon can provide more accurate transcript quantification. But I was recommending featureCounts as it takes less time and resources to run.

ADD REPLYlink written 4 days ago by arup1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1939 users visited in the last hour