input for Connor: FASTQ files from several lanes

0

Entering edit mode

6.7 years ago

lien ▴ 90

Hi all,

I would like to use Connor to de-duplicate a tagged BAM file and produce a BAM file with consensus alignment pairs. I used the Thruplex Tag-Seq kit to prepare samples and sequence. However, the sequencing was done on 1 Illumina instrument, but divided over several lanes, resulting in 8 different FASTQ files (4 R1 and 4 R2).

I'm wondering what would be the best approach to handle these files. Do I first need to concatenate all FASTQ files, align with BWA MEM and input to Connor? Or would it be better if I align FASTQ files (4 pairs) separately using BWA MEM and then merge the BAM files and input to the merged BAM file to Connor?

Any experiences with Connor?

Thanks a lot, Lien

Connor Fastq BAM • 1.7k views

ADD COMMENT • link 6.7 years ago by lien ▴ 90

0

Entering edit mode

If you are only looking to de-duplicate the data then take a look at: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates. You do not need to align the data with clumpify.

ADD REPLY • link 6.7 years ago by GenoMax 141k

0

Entering edit mode

Thanks but I used the ThruPlex Tag-Seq kit with UMIs and would like to also generate a BAM file with consensus alignment pairs that represent original biological molecules. So I'm afraid Clumpify is not doing everything I need.

ADD REPLY • link 6.7 years ago by lien ▴ 90

0

Entering edit mode

Looking at the tech note on Connor, you could go either route as long as you feed Connor a BAM file that has not been manipulated as described in the note.

ADD REPLY • link 6.7 years ago by GenoMax 141k

Login before adding your answer.