Entering edit mode
6.2 years ago
mzaki
▴
10
I have >100x HiSeq reads for an organism, and I want to determine sequences of each copy of a tandem duplicated gene cluster.
I have done it as follows:
- mapping reads to reference gene (BWA)
- estimating the copy number based on sequence depth (samtools)
- calling variants with GATK HaplotypeCaller -ploidy n
- physical (read-based) phasing by my hands like solving SUDOKU or logic puzzles
Yes, I did it. But I wonder there must be sophisticated tools for the last step.
Any suggestions?