Newbler: Merged Miseq Reads + Mate Pairs?
1
1
Entering edit mode
7.8 years ago

Hello Everyone,

I recently received MiSeq 250bp PE sequencing results from a small genome (30Mb-50Mb) eukaryote, that is highly heterozygous. After doing some assemblies I realized the PE actually overlap and that the fragment size is around 300bp. What a waste, lots of redundant sequencing. So most of the reads can be merged, and those that cannot, are actually of very poor quality, I tried mapping them back and they are full of chimeras, so not a bad idea to discard them.

I also received some HiSeq mate pairs, big insert size of about 2500bp. After cleaning these up, removing adapters and contamination however, the nucleotide coverage of these is very poor, around 5x and the reads are 50bp in length. The MiSeq merged reads have a decent nucleotide coverage of around 300x to 450x.

Not sure any DBG assembler is going to be happy with inconsistent read lengths of the MiSeq merged library and the low coverage of the mate pairs. Not even sure what kmer size I would even pick.

That's why i was thinking Newbler. I heard it is really good at scaffolding 454 jump libraries, I am wondering how it would perform on illumina jumps. I also think it will like the merged MiSeq, since they are much better in quality than 454 reads.

Anyone have any idea how I could feed illumina mate pairs? The tutorial I found (1) shows how reads can be assembled, but does show how to use illumina mate pairs, or specify insert length. Seems like newbler just guesses that?

Would be happy to hear any input, Adrian

illumina assembly • 4.8k views
1
Entering edit mode
7.8 years ago
lexnederbragt ★ 1.3k

A DBG assembler should work for your data, they do not necessarily expect uniform read lengths. Newbler is an option (BTW, I am the author of the tutorial you mention :-) ). In principle, newbler should recognize the pairing of your Illumina mates. You will probably have to ensure they are oriented as 'innies' --> <--, meaning reverse complementing both files. (You should try both including the original orientation and the RC one and compare to make sure). Newbler will estimate the insert length based on the distance between mates that are mapped to the same contig. You cannot tell newbler what the insert size is beforehand.

Good luck!

0
Entering edit mode

Thank you for your help. The problem I am running into now is that my MiSeq reads are on average 400bp in size, while my mate pairs are 50bp in size. I have no choice but to set -ml to 40, and the assembly can't seem to finish.

0
Entering edit mode

I don't think the -ml 40 is the problem for the long computational time. You may have added too much data. Try downsampling to 100x or lower for the merged MiSeq reads...

0
Entering edit mode

So I tried the assembly, with the following CMD: ~/454/bin/runAssembly -m -ml 40 -mi 92 -cpu 8 -large GDR-16_R1_top6M.fastq GDR-17.RD30.NotEmpty.LinkerTrimmed-50bp-PR_RC.fastq

There are 6M reads providing a 50X coverage of MiSeq merged data (GDR16). There are also about a 4x coverage of 4M reads of 50bp mate pairs (GDR17). Without the large option, it doesn't complete the assembly.. However, with the large option, it does complete it, and the stats are absolutely horrible...

0
Entering edit mode

Can you tell whether Newbler understood your mated reads as pairs? Check the 454NewblerMetrics.txt file (see https://contig.wordpress.com/2010/03/11/newbler-output-i-the-454newblermetrics-txt-file/ under statistics for the library).

0
Entering edit mode

So I guess it's a problem that I do not have a library {} section in my newbler metrics right? :) paste.fedoraproject.org/92412/96908166

MP are organized as such in one file:

@HWI-D00104:88:D1HV0ACXX:1:1101:1595:2177 1:N:0:AACGCG
TGTTTACATAATACTTTTTTATGGATTTACACCACTATTTTTACATTTAT
+
JJJJJJIJIIIHDIJJJJJJJJJIJJJIHIHIHFJJJHHHHHFFFFFCCC
@HWI-D00104:88:D1HV0ACXX:1:1101:1595:2177 2:N:0:AACGCG
TAAGAGGGCATCCAGGTGAATACAACTGTGCATACAGAATACAATTGGCC
+

0
Entering edit mode

Hmmm, I thought Newbler would recognise the pairing info for this format already. You could try the method listed here: https://contig.wordpress.com/2011/09/01/newbler-input-iii-a-quick-fix-for-the-new-illumina-fastq-header/. If that still doesn't work, maybe you'll have to split the fastq mates into two files, one for the forward and one for the reverse read. Also, I am uncertain whether you should reverse complement the mates as Newbler may expect the reads in the --> <-- orientation (yours may be <-- -->).