Creating a personalized genome assembly for read simulation from illumina WGS data
1
0
Entering edit mode
14 days ago
scsc185 ▴ 80

I have whole genome illumina sequencing data for an individual, and I would like to generate a sample-specific genome that I can later use for read simulation. After some discussion with ChatGPT, I’ve outlined the following workflow:

  1. Align and call variants against a reference genome
  2. Phase variants to distinguish maternal and paternal haplotypes.
  3. Generate consensus FASTA sequences for each haplotype

My goal is to end up with two FASTA files (one for each haplotype) that approximate this individual’s genome and then use it to simulate Illumina reads. I am not familiar with this type of workflow, so I am wondering if anyone has done something similar in the past and could sanity check the above workflow. Any suggestions on best practices and improvements are appreciated.

illumina consensus wgs phasing simulation • 365 views
ADD COMMENT
2
Entering edit mode
13 days ago

Why do you need a genome to use for read simulation when you have the reads from the required sample?

Anyway, you can use a tool like seqtk for this, mutfa is the command. https://github.com/lh3/seqtk

seqtk

Usage:   seqtk <command> <arguments>
Version: 1.3-r106

Command: seq       common transformation of FASTA/Q
         comp      get the nucleotide composition of FASTA/Q
         sample    subsample sequences
         subseq    extract subsequences from FASTA/Q
         fqchk     fastq QC (base/quality summary)
         mergepe   interleave two PE FASTA/Q files
         trimfq    trim FASTQ using the Phred algorithm

         hety      regional heterozygosity
         gc        identify high- or low-GC regions
         mutfa     point mutate FASTA at specified positions
         mergefa   merge two FASTA/Q files
         famask    apply a X-coded FASTA to a source FASTA
         dropse    drop unpaired from interleaved PE FASTA/Q
         rename    rename sequence names
         randbase  choose a random base from hets
         cutN      cut sequence at long N
         listhet   extract the position of each het
ADD COMMENT

Login before adding your answer.

Traffic: 5624 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6