Entering edit mode
8 hours ago
scsc185
▴
80
I have whole genome illumina sequencing data for an individual, and I would like to generate a sample-specific genome that I can later use for read simulation. After some discussion with ChatGPT, I’ve outlined the following workflow:
- Align and call variants against a reference genome
- Phase variants to distinguish maternal and paternal haplotypes.
- Generate consensus FASTA sequences for each haplotype
My goal is to end up with two FASTA files (one for each haplotype) that approximate this individual’s genome and then use it to simulate Illumina reads. I am not familiar with this type of workflow, so I am wondering if anyone has done something similar in the past and could sanity check the above workflow. Any suggestions on best practices and improvements are appreciated.