I performed WGS of a new transgenic mouse model generated by random integration of a 10.316Kb transgene with the goal of locating the insertion site/s and calculating copy-number.
The PromethION flowcell produced an output of 54.86Gb with 2.37 Million reads generated and I received this data in the form of 595 based called fastq files
My workflow thus far.
1) merged 595 fastq into a single
denovo_assembly.fasta with flye:
--nano-raw merged.fastq.gz --genome-size 2.6g --threads 129 --out-dir ./flye_output
3) align 10.316Kb
minimap2 -c -P -L --cs=long --frag=yes --rmq=yes -t129 denovo_assembly.fasta insert.fasta > insert_to_denovo.paf
First record of
Insert 10316 12 10316 + contig_1535 4057936 10162 20473 10289 10322 8 NM:i:33 ms:i:19639 AS:i:20448 nn:i:0 tp:A:P cm:i:1858 s1:i:10218 s2:i:10223 de:f:0.0017 rl:i:0 cg:Z:3911M2I15M4I261M1I248M1I980M1D27M1D1934M1I210M1I755M10D1M2D15M1D4M2D497M1D1185M1I250M cs:Z:=AGCTNNNNN......
From the output of
insert_to_denovo.paf I can see that my 10.316Kb
insert.fasta aligns with 10.322Kb of
contig_1535 from my
denovo_assembly.fasta. Followed by a second alingment of length 10.333Kb a third alignment with length 10.206Kb and a 4th truncated alignment of length 5.137Kb.
I then extracted the fasta sequence for
denovo_assembly.fasta with awk to make
contig_1535.fasta to benchling and auto-annotated based on features generated from
The results of annotation indicated 3 nearly complete insertions and 1 partial insertion of
A screenshot of the annotation results (below) shows an insertion pattern of a head-to-head insert (Pink-Green), followed by 3 tail-to-head inserts (Green-Purple-Orange) with the partial insertion sequence appearing last in Orange.
NCBI blast of the Grey portion flanking the Orange partial insert indicates that this sequence corresponds to Chr3 of mm39. However there are no bases to the left of the Pink insert. As a result I cannot identify the leftmost insert site.
Is there a method that I can use to identify sequences that overlap with the leftmost portion of contig_1535 that might have been discarded during assembly?
Alternatively, is there a another workflow that might keep the leftmost bases, allowing me to find the insertion site?