So I was given aligned reads of liver cancer samples (without matching normals). After viewing the header I noticed that it had chr1-22, chrX, chrY, chrM (note LN: 16571, not LN: 16569). No random or unlocalized contigs.
My worry is that these contigs were aligned elsewhere. Another issue is, I don't know if I can sort these with a generic ucsc.hg19.fasta (that includes random and unlocalized contigs) so that I can call (all) variants using samtools, bcftools.
Should I build my own reference, removing all the random/unlocalized contigs? Also I am only interested in calling variants in very specific regions (tp53, piwil1-4, setdb1). Is there a way of reducing the time to sort and is there a way I can use a 'shortened' reference assembly specific to my regions of interest? Thank you for your time and help.
EDIT: I should note that although I am working with cancer samples, I am not trying to call somatic variants. I just need a pileup of all (high quality) variants.