Using one bam file with huge coverage, How to generate several bam files with different coverage
1
0
Entering edit mode
5.6 years ago
tarek.mohamed ▴ 360

Hi,

I have a bam file includes a single gene sequence with huge coverage (> 5000X) from which I want to call snps. I have several snps previously identified in my sample which I am going to use as control to test my pipeline. From my original bam file , I want to generate some bam files with different coverage. Then I will call SNPs from each one of them, compare them with the previously known genotypes to see which coverage gives me the highest concordance, and what is the coverage threshold below which snp calling is not reliable.

Thanks

bam filter SNPs caller • 1.1k views
ADD COMMENT
1
Entering edit mode
5.6 years ago

Use downsampling in samtools:

samtools view -s 0.1 -o new_bam_at_0.1_fold_coverage_of_the_original.bam yourbam.bam
ADD COMMENT
0
Entering edit mode

Thanks for the reply!

This command will exclude 90% of reads at a position. Which reads will be excluded, is it a random process? Do I have any power to select on what bases these reads will be excluded? as for example keep the reads with highest mapping quality or base quality scores?

Since the coverage is not uniform across my target region, can I downsample to a certain number of reads at a position rather than using percentage?

ADD REPLY

Login before adding your answer.

Traffic: 2810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6