I have paired end data for normal as well as tumour samples for cervical cancer. I am looking into the problem of determining the presence of Viral genomes [one or multiple] and their site of integration.
Determining the presence is trivial, since you can just separate out the reads unmpaaed to human genome and then re-align with the custom viral genome fasta.
I went through the question and the solutions here : Method to identify viral integration site in human genome from NGS data? but they do not solve my problem.
I came across another paper which addresses the same problem : http://jvi.asm.org/content/early/2013/05/30/JVI.00340-13.full.pdf. In the 10th page they mention about using a clustering method to determine the site of integration, but I didn't quite follow the approach.
Another thing bwa-sw [as mentioned in the other biostar question] would not help me determine the integration sites, or am I mistaken ? Can someone guide me to a better approach or provide an explanation to the paper' algorithm?