Dear Community, I need some help with an analysis I need to do for the reviewers before my manuscript can be published. My Bioinformatics level is beginner. See the case, pipeline and the questions below.
I did an RNA-seq differential gene expression study using two yeast strains (strain A and B) and three media conditions (M1, M2, M3) in Triplicates (I-III) resulting in the following samples:
My Pipeline was:
Raw Reads -> [FastQC] -> [Trimmomatic] -> trimmed reads -> [SubRead_align] -> SAM/BAM -> [SubRead_featurecount] -> count_matrix.
I needed to build an index for the reference genome prior read mapping.
My manuscript was accepted but the reviewers want some additional analysis:
1.) Quantification of the number of SNPs identified by RNA seq: Similarity of the strains at the nucleotide level?
2.) Unmapped reads: Are there any transcripts that don't map to the reference?
3.) Which of the unmapped reads are involved in the metabolic pathways I am investigating?
How would a pipeline look like quantifying the number of SNPs identified by RNA seq?
I heard exactSNP from the Subread package could do the job.
Does it make sense to use all samples for this kind of analysis or only a subset?
How can I identify unmapped reads?
I guess the results are different for each sample.
What is a good way to check, if unmapped reads belong to a subset of genes of interest?
For the analyses, I have the reference genome + the above mentioned RNA-Seq data of the samples.
Thanks a lot