Hi all,
I would like to use Connor to de-duplicate a tagged BAM file and produce a BAM file with consensus alignment pairs. I used the Thruplex Tag-Seq kit to prepare samples and sequence. However, the sequencing was done on 1 Illumina instrument, but divided over several lanes, resulting in 8 different FASTQ files (4 R1 and 4 R2).
I'm wondering what would be the best approach to handle these files. Do I first need to concatenate all FASTQ files, align with BWA MEM and input to Connor? Or would it be better if I align FASTQ files (4 pairs) separately using BWA MEM and then merge the BAM files and input to the merged BAM file to Connor?
Any experiences with Connor?
Thanks a lot, Lien
If you are only looking to de-duplicate the data then take a look at: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates. You do not need to align the data with clumpify.
Thanks but I used the ThruPlex Tag-Seq kit with UMIs and would like to also generate a BAM file with consensus alignment pairs that represent original biological molecules. So I'm afraid Clumpify is not doing everything I need.
Looking at the tech note on Connor, you could go either route as long as you feed Connor a BAM file that has not been manipulated as described in the note.