I am faced with something I haven't tried before, and I am unsure how to proceed.
I am working with a Long-PCR product, which has been sequenced on a MiSeq instrument with a PCR-Free kit.
The amplicon is about 15 KB and with regions of high homology to other areas of the genome, so aligning to the whole genome resulted in a big mess.
I am currently trying to align only to my 15 KB reference (I made a custom fasta file of my region of interest) with Bowtie2, and I have used Samtools/Picard and called variants with both GATK and Samtools to compare. There are a few differences, but I think my biggest problem is some Indels. GATK calls no deletions, while Samtools calls deletions but few insertions. So, I was trying to use the local realigment of GATK but I get a blank file. Of course, I could feed it the known Indels file (golden_indels.vcf), but! As I am only doing a targeted alignment, of course the coordinates are incorrect. Is there a way to get around this? (I have been converting my coordinates before annotating VCF files, but I am thinking there must be a better way?)
I was thinking, could I align to the chromosome of interest and specify a region? -I can't seem to find such an option (there is for GATK to call variants only in a certain region, but not for alignment with Bowtie)
I am also interested to know if has anyone tried analyzing data from a a similar set-up:
What are the caveats
Which workflow would you suggest?
What has been your experience with analyzing NGS results for Long-PCR products in general?
How did the mapping/coverage/quality/duplications look?
I know it is a lot to ask, so any partial answer is also very much appreciated.