I am trying to incorporate indel realignment step to my processing pipeline. I use myeloidampliconpanel from Illumina that contains ~1000 amplicons and in total ~100Kb of genome coverage. Prior to the indel realignment step I clean my bams so that no reads are present outside of my amplicons.
Unfortunately when I construct intervals file from bam file and some indel databases (whole genome) I will get the interval file that covers whole genome. I do not understand why it constructs intervals in areas where there is zero coverage?
There is no documentation of interval file. Since I have relatively small genomic area IndelRealigner should do the more work there than in whole genome project. I guess I can somehow pool all the intervals from all my bam files to create list of all possible indels (including those present in all my files) and then run RealignerTargetCreator with this file.
Do anybody of you know what is the correct format of intervals file? I mean if there might be two indels chr1:2-3 and chr1:2-4 should i have interval file with
chr1:2 chr1:3 chr1:4