I've got sequence data back from illumina. It's pair-end 300bp reads with ~50bp overlap. I originally pair-ended the reads as my second step (first being fastQC), and it worked a treat. But we decided to filter the fastq's first.
I've made another post here where I ask how to filter the fastq's based on sequence similarity to another fastq, and why I did it in case you're interested.
This is what I've done:
FastQCto check for adapters
Uclustsearch to find reads with 100% sequence similarity across R1's and across R2's (sepereately; we sequenced the same individual twice from different extractions) for each individual.
Filter fastqs based on
seqtk- you lose around 75% of reads for the ones I've checked.
Pair-end (I've tried both
EA-Utils). But I get errors saying the number of sequences is different, with
PEARsaying no files are in any of the R2 files.
I've re-run the uclust and filtering steps in case it was truncating files. Nothing. I've used EA-Utils fastq-stats to get a summary of the R2 files, and there are reads there. I've even tried pairing the filtered and the unfiltered reads, and the same error comes up.
I may be missing something obvious, but I'm genuinely at a loss. Any help would be greatly appreciated. If anything is unclear, please let me know.
EDIT: The line executed was:
$TOOL/pear -f $DATA_DIR/$forward_read -r $reverse_read -o $OUT_DIR/$output_name
after I removed all the filtering options when it didn't work the first time. It equates to:
pear -f /data//01_Fastq_Filtering/BVG080--BVC393-2_S97_L001_R1_001_filtered.fastq \ -r /data//01_Fastq_Filtering/BVG080--BVC393-2_S97_L001_R2_001_filtered.fastq \ > /data/02_Paired/BVG080-2_paired.fastq`