I am trying to align Illumina mate-pair (5.2Kb) and long mate-pair(10kb) libraries to an assembly of contigs (obtained from CLC workbench) for a reptilian genome (Genome size comparable to humans). I am using bbmap to do the alignment step following which I hope to extract insert size distribution (avg. insert size and std. dev.) to create input files for ALLPATHS-LG tool and carry out a De Novo Assembly.
bbmap.sh rcomp=t rcs=f in=read1.fq in2=read2.fq out=mapped.sam
Is this the correct way to align illumina mate pair(5.2Kb)/ long mate pair(10Kb) libraries.
No matter what combination of the flags rcs=t/f rcomp=t/f I use the standard error file shows that "Processing reads in paired-ended mode.". I can't understand how to use the flags correctly because in the UsageGuide it is suggested that one should use requirecorrectstrand=f (rcs=f) and rcomp=t for long mate pair. However, since I am getting mean insert size of 2611.48 when I am expecting something around 5200bp (predicted 5.2Kb Illumina mate pair library from sequencing center) it is most likely the flags are overridden by the default values which are rcs=t and rcomp=f (which is why I presume bbmap is processing reads in the paired-end mode).
Could you please help.
If you are interested in full range then plot the histogram of insert sizes (
ihist=<file>). Generally there will be a range of inserts and a mean of 2.6 kb is what you have in that library. This is likely accurate than the prediction of the sequencing folks since it is based on actual alignments.
BTW: The processing message may be just an oversight in programming. @Brian will confirm.
Tagging: Brian Bushnell