Hi, I have a BAM file and a reference (let's say 1000 genomes data), I would like to extract from the BAM file all the possible overlapping bases with my reference dataset (not only the variants. Basically I would like to perform a base calling and not a variant calling). This extraction should be done according to a random selection of one read per base. Is there any way to do it?

You could pipe the output of samtools mpileup to a script that randomly selects a single base (no clue why you'd want to do that).

all the possible overlapping bases with my reference dataset

How do you define indel overlap?

I would like to perform a base calling and not a variant calling

How does this differ from setting the flag on your favorite variant caller to also output homozygous reference calls?

This extraction should be done according to a random selection of one read per base.

Why are you downsampling? Why aren't you extracting all read bases that align to any given position and processing that?

• I do not take into account indels.

• I don't have any experience on this, that's why I was asking for help.

• I'm working on a low coverage aDNA.

