Entering edit mode
4.6 years ago
tarek.mohamed ▴ 350
I have a bam file includes a single gene sequence with huge coverage (> 5000X) from which I want to call snps. I have several snps previously identified in my sample which I am going to use as control to test my pipeline. From my original bam file , I want to generate some bam files with different coverage. Then I will call SNPs from each one of them, compare them with the previously known genotypes to see which coverage gives me the highest concordance, and what is the coverage threshold below which snp calling is not reliable.
Thanks for the reply!
This command will exclude 90% of reads at a position. Which reads will be excluded, is it a random process? Do I have any power to select on what bases these reads will be excluded? as for example keep the reads with highest mapping quality or base quality scores?
Since the coverage is not uniform across my target region, can I downsample to a certain number of reads at a position rather than using percentage?