Hi! VERY, VERY (so please be nice to me!) new to the field here, but briefly: performed RNAseq (mouse tissue), got the sequences (fastq format), cleaned them up (trimgalore), aligned them (RNASTAR to mm10) - checked alignments on IGV and can see sufficient reads across all exons for GOI, however when I use HTseq to make count files, GOI has no counts? I've been going through forum after forum trying to figure out where something went wrong, but can't figure it out - any suggestions/insights for investigation into this issue?
I also aligned the reads using RNA STAR in Galaxy. I used the built-in Mus musculus mm10 genome. I selected to NOT count number of reads per gene since I did this part later.
No reason not to count with STAR, it is essentially a free analysis: it doesn't add significant run time, memory or disk usage, compared to just mapping. And the results are equivalent to featureCounts and HTSeq.
Selected "stranded" for the input on strand specificity because that is what the core that gave me my samples did for library prep.
The most common library preparation (Illumina TruSeq stranded) would result in a reverse-stranded library, so the correct setting for featureCounts on the command line is
-s 2, and for HTSeq is
-s reverse - I guess there should be a
stranded: reverse option for Galaxy. You have to check the library preparation kit manual or ask the sequencing center staff for details.
I downloaded this .gtf file for the genome: Mus_musculus.GRCm38.96.gtf.gz (this is essentially mm10 from ensembl)
As arup hinted, you have to use genome and annotations from the same source, as the chromosome naming convention may be different between them (some sources use
3 and so on for chromosome names, while others use
chr3). Check the bam header and the annotation gtf to see if chromosome names are equal.
Did you check the mapping rates (from STAR) and feature assignment rates (from featureCounts)? Do they seem correct? What proportion of your reads is mapped and what proportion is assigned to a feature? Did you check if the mappings you see at the gene of interest are multi-mappers? Multi-mapping reads are not counted by STAR / featureCounts / HTSeq. Then, as kristoffer.vittingseerup pointed out, you would need to use a different method (RSEM, Salmon or kallisto) to be able to quantify these reads.
Some additional comments:
Please define your abbreviations the first time you use them - I had no idea GOI meant
gene of interest, I thought it was the name of a gene.
The issue of seeing mapped reads but having no counts is a recurrent one, and there are several posts on BioStars and other forums about it. If you search for the words on your post title, you will find several of them, do read them to check if a suitable solution has been posted elsewhere.
As you have been using Galaxy for most of your analyses, maybe you can ask your question there. Before doing so, follow up carefully on the suggestions here, and if none solves your question, then go ahead and post there. Please make sure to state here (edit your original question and add the information there) and at GalaxyHelp you cross posted, providing links between the posts.