I have a doubt regarding what does it mean coverage at SNP sites. I'm supposed to be finding genes with SNPs ≥ 10X coverage in the samples and also they have taken average 8 SNPs/ transcripts. Any idea what it could mean.
Identify the genes that house the variants from step 2
Count the variants per gene
Filter the genes with more than 8 SNPs
You can further refine the quest by annotating the vcf. Annotate VCF, filter variants with more than 10x read depth, if you have sufficient number of variants post filtering, filter the variants further by effect (non-synonymous), then filter by number of variants per gene.
'Coverage' and 'read depth' are commonly mis-used. Read depth is the number of reads that align to any given base position. Coverage ('depth of coverage') is a summary metric that says that all bases in a particular region (or regions) achieved a minimum read depth. For example, we may say that 99% of our target regions achieved > 10x coverage, meaning that 99% of bases had an individual read-depth > 10. We may also say that we performed whole genome sequencing with a target depth of coverage of 5x, meaning that we provided sufficient re-agents and ran a sufficient number of sequencing cycles such that the expectation is that each base, genome-wide, will have a read-depth >= 5.
Your wording is confusing, but you could likely interpret it as that you are expected to filter your data to include only those SNPs that have read depth > 10.