Hi, I have received fastq files containing the reads from Illumina MiSeq. Since they are paired-end, there is an R1 and an R2 file for each sample. So I expected to find reads beginning with our forward primer in the R1 files, and reads beginning with our reverse primer in the R2 (or vice versa). However, I find both in both; i.e. about half of the reads in the R1 files begin with the forward primer, and half with the reverse primer; and same with the R2s. I tried merging them, but this results in about half of the reads being reverse complemented, and this makes things more complicated downstream, so I would like them to all go in the same direction. I thought to grep for each of the primers, but because of ambiguities and some still having short tags on the beginning, I don't think it's going to work--plus I thought they weren't supposed to be mixed anyway...??? Maybe I don't understand this as well as I thought. Any ideas? Thanks.
Actually the reads are always mixed just the way you describe them. R1 may be forward or reverse. R2 may also be forward or reverse. You are only guaranteed that the pairs are complementary. Depending on your requirements, you may indeed need to check which is which down the pipeline. Standard alignment utilities do that automatically.
I use FLASH to merge the reads, really easy to use.
Other options are:
On this page you can find a comparison: https://www.researchgate.net/publication/303288211_Evaluating_Paired-End_Read_Mergers
It depends on the sequence length, but you can first merge the reads and after that trim the primers