How to find integration loci of trans gene from whole genome sequence
0
0
Entering edit mode
5.7 years ago
shashwat36 • 0

Hello, I want to find the location of random integration for a gene on the genome. I am working with Pichia Pastoris and have random integration of a transgene. I sequenced the whole genome on Illumina Miniseq and have 80x coverage of the genome. It sounds pretty straight forward but I have struggling. Here is what I have tried:

  • Align paired end reads to wild type Pichia genome using bwa
  • combine *.sai into a bam file
  • sort and index the bam file
  • generate consensus fasta sequence from the bam file using samtools pileup | bcftools | vcfutils.pl
  • bwa index the resulting fasta
  • align trans gene sequence against that index

When I do that, I end up with zero alignments. However, I am certain that the gene is there and has been confirmed by qPCR. Can someone please help?

Thanks

next-gen genome alignment • 1.3k views
ADD COMMENT
1
Entering edit mode

Never did that myself, but here are my thoughts (hope this is paired-end data):

  1. Add the transgene sequence as a new chromosome to the reference genome
  2. Index with BWA, then align against it with BWA mem
  3. Extract all reads that overlap the transgene sequence (samtools view -b -o overlap.bam in.bam chrTR) where chrTR is the name that you gave your "new chromosome"
  4. Extract all soft-clipped reads
  5. Align these reads against the original reference genome without the extra chromosome

This should probably give you an idea where your insertion site(s) is/are. How sure are you that you have a single integration event and not multiple ones?

ADD REPLY
0
Entering edit mode

Thanks for your solution, it was really helpful. I ran the analysis as per your suggestion and I have two questions:

  • The vector contained a landing pad so I know the expected site of integration and I did see the soft clipped reads aligning to that loci. However, I also saw the reads aligning to another location on the chromosome. Does that mean that I have an additional random integration event or is that just noise that should be ignored? I ask because we have other stains where we try random integration in genomes and I would like to be able to tell between false alignments and actual integration sites in those genomes.

  • I took the soft clipped reads from the two ends of the linearized insertion vector and aligned them to the wild type genome. Theoretically reads from both ends should align to roughly same location on the genome. However, in one alignment locus, I see only the reads from one end of the vector, does that mean that there is a partial integration there? And if so, is there any way to tell of the gene of interest is present there from this data alone? Without having to run a par?

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6