Question: BAC sequencing issues
gravatar for int11ap1
6.2 years ago by
int11ap1420 wrote:


I have paired-end Illumina reads from several BACs of an unsequenced plant species. After assembling them using SOAPdenovo, I realized the total assembled size for each BAC ranged from 1M to 2M approximately (way too much). So, I blasted some scaffolfs and finally, I concluded that they sequenced the whole BACs (100% identity with E. coli).

I thought two ways to handle this:

  • 1) align reads over genome of E. coli and downloaded BAC sequences from NCBI? and take only the unaligned ones to perform assembly de novo.
  • 2) perform assembly de novo of all reads, and make contigs and scaffolds. Then I'd put all scaffolds of all BACs together into a fasta file and I'd remove redundancy with any tool (e. g.  CAP3... Do you know any other and better tool?) in order to decrease the time needed to blast. Scaffolds with hits would be discarted.

According to your experience, what would you do??

sequencing denovo assembly • 2.1k views
ADD COMMENTlink modified 6.2 years ago by Shyam130 • written 6.2 years ago by int11ap1420
gravatar for SES
6.2 years ago by
Vancouver, BC
SES8.4k wrote:

It is not likely that you sequenced the whole clone unless you skipped the step of the BAC prep where you isolate the insert with a digest. Though it is a odd to see that size assembly from a BAC, you probably just sequenced a lot of the clone, as well. In my experience, it is very common to see lots of contamination from the clone in your reads, so nothing to worry about here. My advice is to obtain the clone sequence used to construct the BAC library. This can be used for screening.

The next thing to do is screen your reads, and I would personally not use an aligner for this, BLAST seems to be more sensitive. You can certainly screen reads with an alignment approach, but in my tests I always still ended up with large chunks of clone DNA in my assemblies.

I would strongly advise against assembling the raw reads and then trying to remove the contaminants. You will not simply end up with contigs from the clone, which would be easy to remove. Instead, there will be large clone-derived stretches of DNA in the middle of contigs and this won't be easy or straightforward to fix. Better to just screen your reads, then assemble. 

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by SES8.4k
gravatar for Shyam
6.2 years ago by
United States
Shyam130 wrote:

You need to remove the reads coming from the BAC vector backbone and E.coli genome. You can use a program like Deconseq. Assemble only after removing these reads. I agree with @SES, as assembling with the contamination gives misassemblies. 

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by Shyam130
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2289 users visited in the last hour