I have a cohort of patients that we are doing HIV deep sequencing of integrated provirus and because of the technical methods we are using we also get some of host (human) genomc sequence ... I'm estimating 2 billion bp, so ~1X coverage. Its not going to be enough coverage to capture any clinically relevant human SNPs (unless we get extremely lucky).
The data is a combination of Illumina NextSeq and PacBio reads (mostly Illumina) on what will ultimately be >500 patients.
I already know what I'm going to do with the HIV sequence but I'm looking to squeeze something interesting out of this "leftover" data.