Question: Reference based assembly with transgene
0
gravatar for shashwat36
8 months ago by
shashwat360
shashwat360 wrote:

Hello, I am trying to sequence pichia genome with a transgene insertion. I have paired end reads from Illumina miniseq.

I tried using BWA with wild type genome as reference followed by samtools mpileup and vcfutils to get a consensus sequence. However, in that approach, my insertion vector was not a part of the final assembly. I aligned the reads against the my vector sequence and saw over 40,000 reads aligning to the vector sequence so I am sure that my insertion vector is present in the genome.

I then used Abyss to do a denovo assembly and used the scaffolds generated from abyss to align against the wild type genome and again I ended up with a final assembly with no vector sequence.

Is there a way to get a final assembly with four chromosomes (Pichia pastoris) and my vector sequences present?

Thanks, Shash

next-gen assembly • 445 views
ADD COMMENTlink modified 7 months ago by colindaven1.1k • written 8 months ago by shashwat360

I then used Abyss to do a denovo assembly and used the scaffolds generated from abyss to align against the wild type genome and again I ended up with a final assembly with no vector sequence.

Seems odd to me. Given the coverage you said your transgene had when you mapped using it as reference, it should have been assembled. Did you try blasting the transgene against the abyss assembly?

ADD REPLYlink written 8 months ago by h.mon24k

Yes I filtered the reads that aligned to my vector sequence and then I aligned those reads against the contigs generated by Abyss and they all align. My issue is that when I take those contig sequences and align them against wild type genome as reference using BWA, my vector disappears in the final assembly

ADD REPLYlink written 8 months ago by shashwat360

Of course the vector "disappears", the reference genome doesn't contain it and my guess is it is soft clipped.

You are not explaining in depth what you are doing, nor are you providing the commands used - details matter a lot here.

ADD REPLYlink written 8 months ago by h.mon24k

Okay here is what I am trying to do. I am trying to get a fully annotated genome of my strain and since Pichia is already fully sequenced an annotated, I was hoping to leverage that data instead of doing it from scratch. Here is what I have done so far:

#assemble reads with abyss

abyss-pe name=pp1 k=64 in='reads1.fq reads2.fq'

This yields ~200 contigs. Now instead of finding ORFs and annotating them, I figured that I can use BWA to align these contigs to Pichia pastoris genome and them manually annotate my vector sequences. And this will also allow me to find locations and copy numbers of my gene.

#use bwa to align to GS115 strain (fully sequenced from NCBI)

bwa index reference.fa

bwa mem reference.fa contigs.fa | samtools sort -o output.bam

samtools index output.bam

Then I aligned my reads to my insertion vector using bwa and got the mapped reads using samtools. Then I aligned those mapped reads against the assembled genome using bwa but only very few reads aligned. I am guessing these are the soft clipped reads and therefore, it seems like my vector is not a part of the final assembly.

And my question is that is there a better way to get the final genome sequence contacting the vector?

Thanks

ADD REPLYlink modified 8 months ago • written 8 months ago by shashwat360

You could map your reads against the available genome (which doesn't contain the transgene), then just assemble the non-mapping reads de novo.

ADD REPLYlink written 7 months ago by cschu1811.5k
0
gravatar for h.mon
7 months ago by
h.mon24k
Brazil
h.mon24k wrote:

Create a blast database with your abyss-assembled genome, and search the transgene against this database.

ADD COMMENTlink written 7 months ago by h.mon24k

I don't see how that is different from aligning the transgene against the abyss-assembled genome using BWA. It wouldn't give the integration locus on the genome and neither would it give me the final annotated consensus sequence. Any thoughts?

ADD REPLYlink written 7 months ago by shashwat360
1

If your transgene has been assembled, it will be part of a contig. Blasting the transgene against the abyss-assembled genome would return the contig and the position in the contig. Then you could examine this position using IGV.

ADD REPLYlink written 7 months ago by h.mon24k
1

BLAST will be a lot more sensitive - allow more mismatches, and partial alignments - than BWA.

ADD REPLYlink modified 7 months ago • written 7 months ago by colindaven1.1k
0
gravatar for colindaven
7 months ago by
colindaven1.1k
Hannover Medical School
colindaven1.1k wrote:

Really difficult problem. It might make sense to search the raw reads for your insertion sequence - use python or grep.

In my experience neither de novo nor alignment strategies work well for this problem. A long read assembly would probably nail it, but noone seems to have the money for doing those with insertion experiments....

Some companies offer more advanced wetwork approaches, which seem promising.

ADD COMMENTlink written 7 months ago by colindaven1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1739 users visited in the last hour