I have several fastq files with paired-end reads from an amplicon panel. The forward and reverse primers contain matched inline barcodes that should form known pairs if the amplification step was free of cross contaminated.
To check for contamination, I need to reconstruct my target sequences by pairing the reads based on their overlap, while preserving the primer. In the downstream analyses I will evaluate the percentage of correct pairings vs incorrect pairings. I was planning to use pandaseq for the paired-end assembly of illumina reads, but, if I am understanding the documentation correctly, the sequences reported by pandaseq's have their primers removed, which makes them useless for my downstream analyses. I checked the manual to find out I I could enforce that the program reports sequences with primer, but didn't find mention of that. Is there an alternative software that can assemble sequences from Illumina sequences without removing the primers?
Sounds like you are simply looking for a read merging program. Take a look at
bbmerge.sh
, FLASH.Looks like PANDAseq is using word
assembly
in their paper (which is from 2012, an eternity in NGS world). Currently this usage would be unusual at best. Individual paired-end reads aremerged
rather thanassembled
, if they overlap.