Question

Using one bam file with huge coverage, How to generate several bam files with different coverage

0

Entering edit mode

6.9 years ago

tarek.mohamed ▴ 370

Hi,

I have a bam file includes a single gene sequence with huge coverage (> 5000X) from which I want to call snps. I have several snps previously identified in my sample which I am going to use as control to test my pipeline. From my original bam file , I want to generate some bam files with different coverage. Then I will call SNPs from each one of them, compare them with the previously known genotypes to see which coverage gives me the highest concordance, and what is the coverage threshold below which snp calling is not reliable.

Thanks

bam filter SNPs caller • 1.4k views

ADD COMMENT • link updated 6.9 years ago by WouterDeCoster 48k • written 6.9 years ago by tarek.mohamed ▴ 370

score 1 · Answer 1 · 2018-09-03

1

Entering edit mode

6.9 years ago

WouterDeCoster 48k

Use downsampling in samtools:

samtools view -s 0.1 -o new_bam_at_0.1_fold_coverage_of_the_original.bam yourbam.bam

ADD COMMENT • link 6.9 years ago by WouterDeCoster 48k

0

Entering edit mode

Thanks for the reply!

This command will exclude 90% of reads at a position. Which reads will be excluded, is it a random process? Do I have any power to select on what bases these reads will be excluded? as for example keep the reads with highest mapping quality or base quality scores?

Since the coverage is not uniform across my target region, can I downsample to a certain number of reads at a position rather than using percentage?

ADD REPLY • link 6.9 years ago by tarek.mohamed ▴ 370