Hello all,
I have a large number of sequencing samples that were aligned to a reference genome, and I want to find duplications and deletions of these genetic regions (preferably on exon-intron level) with the coverage. Calculating the coverage itself has not been an issue; I have the coverage of the whole region calculated with samtools bedcov and a more detailed coverage calculation with bedtools coverage, but I cannot connect the output to the question that I want to answer: "Is my gene duplicated/deleted in certain samples?"
The samples are from a couple of different species within the same genus, so it can be expected to have some duplications/deletions. The alignments are stored in BAM files per sample, and I have both gff and bed files covering my region of interest.
I have googled the question as well but all related things that came up were along the lines of "how to delete duplicate reads from bam files?" which is not what I want to know. Also, if there are other methods that do not require coverage, that is fine as well, but that was the most straightforward for me.
Many thanks in advance.
https://github.com/GooglingTheCancerGenome/sv-callers/wiki
You should look at a tool like
abra2
(LINK). Identifying deletions can be tricky business and you may need to use a method that does local realignment of the data.