I have looked around a lot to find how to analyse whole exomes. Literature indicates the usage of Samtools, Bedtools and GATK. But, I am unable to find any clear and detailed tutorial for how to proceed with exome BAM files.
I want to analyse paired-end BAM files which are the whole exomes already aligned with reference using BWA and duplicates marked (as @PG indicates ID:bammarkduplicates2). There are two groups each with 3 individuals, so I have 6 BAM files in total.
I have done some initial analysis using Qualimap and from the PCA, I could see the variations (polymorphism in the individuals) based on how they clustered.
However, I am interested to find out further:
1) the total number of genes in each and then average number of genes from all 6 files?
2) conserved / non-conserved regions in exomes with respect to reference
3) location for genes of interest on exomes with respect to reference (I have gene list)
4) Any other way for PCA and polymorphism information
I would appreciate any guidance for the above.
P.S.: I am a R admirer, so the R solutions would work best!