I downloaded a file of Illumina paired reads from SRA. When split into _1 and _2 using the sratools fastq_dump --split-files, the fastq record IDs looks like this (I'm showing just the identifier lines of the first record in each file)
_1.fastq @SRR4734558.1 HWI-ST1117:138:C1HR1ACXX:5:1101:1451:1979 length=100 _2.fastq @SRR4734558.1 HWI-ST1117:138:C1HR1ACXX:5:1101:1451:1979 length=100
i.e., they're exactly the same in files _1 and _2. BWA-mem (v 0.7.15) is giving me error messages with these files, saying it can't find any FR pairs (and soon after, a core dump). This seems to be because there isn't any indication of '1' and '2' in the IDs. I tried adding '1:' and '2:' before the 'HWI' (using sed)
_1.fastq @SRR4734558.1 1:HWI-ST1117:138:C1HR1ACXX:5:1101:1451:1979 length=100 _2.fastq @SRR4734558.1 2:HWI-ST1117:138:C1HR1ACXX:5:1101:1451:1979 length=100
but BWA still didn't find FR pairs.
I also tried reducing the ID to a string with no space, and adding /1 and /2 to the end of the ID lines
_1.fastaq @HWI-ST1117:138:C1HR1ACXX:5:1101:1451:1979/1 _2.fastq @HWI-ST1117:138:C1HR1ACXX:5:1101:1451:1979/2
But BWA still does not see them as a FR pair. (It only sees reads as FF and RR in all of these cases)
So, what is the proper way to indicate '1' and '2' so that BWA still them as FR? Or is there something wrong with this version of BWA? (I am requesting our HPC IT to install the latest) . NB I confirmed that there are no duplicate IDs within either file.