Hi guys, this is my first post here =) but there's something troubling me. I'm doing a RNA-Seq experiment to find differential expression profiles from two groups of neurons in sheep brain. I've been playing around with different databases to see what helps to get best results.

When I mapped my reads using STAR using the annotations and the genome downloaded from Ensembl it produced the same percentage of assigned reads as when I did it with the annotations and genome downloaded from NCBI (61,79% (67 million) uniquely mapped and 33.75% (36 million) multimappers (total input 108 million)). So far so good, which makes me think that the annotations and genome were consistent among platforms (NCBI-Ensembl).

However, I noted something when I used featureCounts to read summarization. When I used the NCBI data I got a higher percentage of successfully assigned reads (23.7%, 53 million) than when I used the Ensembl data (12.3%, 27 million), with same total reads of 226 million. Also, I noted that both annotation files contained a different ammount of features and meta-features. NCBI annotation data contained 29300 genes, whereas Ensembl annotation only 27054.

How can two annotation data files with different number of features produce the same ammount of mapped reads but so dismayingly different ammount of assigned reads?

When I read the summary of featureCounts, it tells me that the vast majority of my unassigned reads went to Unassigned_Multimapping, and when I read the manual it says that it's because STAR deemed them as multimappers and featureCounts is only reading a tag that comes from STAR, but the number of multimappers is approximately the same in both STAR run summaries.

Hope you could help me.

