Entering edit mode
6.0 years ago
Simo ▴ 50
Hi, I have a BAM file and a reference (let's say 1000 genomes data), I would like to extract from the BAM file all the possible overlapping bases with my reference dataset (not only the variants. Basically I would like to perform a base calling and not a variant calling). This extraction should be done according to a random selection of one read per base. Is there any way to do it?
You could pipe the output of
samtools mpileupto a script that randomly selects a single base (no clue why you'd want to do that).
How do you define indel overlap?
How does this differ from setting the flag on your favorite variant caller to also output homozygous reference calls?
Why are you downsampling? Why aren't you extracting all read bases that align to any given position and processing that?
I do not take into account indels.
I don't have any experience on this, that's why I was asking for help.
I'm working on a low coverage aDNA.