Illumina reads mapped back onto contigs have gaps
0
0
Entering edit mode
6 months ago

I assembled a genome using pac bio reads with the Flye Assembler. I then mapped my cleaned up illumina reads using bwa-mem onto the longest contigs and there is gaps in the contigs. Why is this?

illumina mapping • 233 views
0
Entering edit mode

there is gaps in the contigs.

If that means Illumina reads are not completely covering the said contigs then

1. You don't have enough Illumina sequence data to provide adequate coverage
2. Illumina data comes from RNAseq so only covers expressed part of the genome
3. Your flye assembly is incorrect
0
Entering edit mode

I trimmed the illumina reads for quality and to trim the adapters. I used Trimmomatic with these parameters:

LEADING:10 TRAILING:10 SLIDINGWINDOW:5:30 MINLEN:150


The genome size is ~ 200 Mb These are the results from trimming of the illumina reads:

Input Read Pairs: 128614698 Both Surviving: 49197716 (38.25%) Forward Only Surviving: 20232226 (15.73%) Reverse Only Surviving: 13258328 (10.31%) Dropped: 45926428 (35.71%)


This is not RNAseq (but I do also have RNAseq illumina reads). Are there supposed to be zero gaps once I map illumina reads back onto assembly contigs? Am I trimming the illumina reads too much? Do I need to trim the reads at all? Thanks!!

0
Entering edit mode

I don't know what the length of sequencing you have since even with the trimming you are setting the minlength to be 150. You could try aligning without any trimming at all and see if things improve.

Are there supposed to be zero gaps once I map illumina reads back onto assembly contigs?

Ideally. One would think there should be no bases/areas that not covered by at least a few reads as long as the assembly is good and there is plenty of Illumina data to cover the assembly. Did the Illumina reads go into the assembly or it was a pure PacBio one.

0
Entering edit mode

The illumina sequencing is 2 x 150 bp. The assembly was with pure pac bio reads. I could polished the pac bio assembly using illumina reads with pilon and see if that helps. But I will also try to map illumina reads without trimming or with different trimming parameters.

0
Entering edit mode

Would you expect that contaminant contigs that have not yet been removed- would have gaps in coverage or less coverage?

0
Entering edit mode

Difficult to say. Were the preps for PacBio and Illumina libraries done from the same genomic DNA? If the "contaminant" contigs are in fact mis-assemblies then Illumina reads could still map to them.