I am trying to identify the transgene insertion sites for the MHC-Cre transgene in mice. I have whole genome sequencing data from illumina, and I have constructed my "best guess" transgene sequence. I have already preformed Quality control, and aligned the sequencing genome to the reference genome . I also added the best guess transgene sequence to the reference genome before running the alignment. After aligning, I ran Blastn on the aligned file using the "best guess" transgene sequence as my query, to make a list of reads. I then assembled that list of reads into contigs, using velvet.

I am not sure if the steps I have taken are correct, in terms or accomplishing my goal. I need to identify the insertion sites so that I can assemble the full transgene sequence. I will then compare the results between another group of MHC-Cre transgene inserted mice.

Please let me know what I still need to do in order to accomplish my goal. Also, if there is a more simple approach please let me know.

Thank you

