Hi all,
I am analyzing paired-end data of amplicon libraries from a target region of a viral gene. Briefly, I ordered a PCR product where I designed degenerated codons at specific positions (let's say at 4 different positions) that are flanked by conserved nt sequences. I am interested in looking at the dynamics of gene variants (haplotypes/motifs) in this region under different experimental conditions. I prepared multiple amplicon libraries from this target region and the sequencing results look good. After adapter removal, I further trimmed and filtered out reads with a defined length using the conserved sequences flaking my region of interest.
With paired-end data of amplicon sequencing, match read pairs (read1/read2 or forward/reverse) should be complementary in sequence. So far, I have been working with read1/read2 separately (two fastq files, read1.fastq and read2.fastq). Before proceeding with mapping and variant calling, I want to compare these two fastq files and output only the match read pairs that are fully complementary. Could anyone offer some advice on how to accomplish this? I do not have much experience in programming, but I have looked at some posts where they use hamming distance to compare two strings. Could it be applied to compare two fastq files? Is there a more straight forward approach?
Thanks in advance.
Thanks. That's exactly what I was looking for.