I'm afraid it's a silly question but please help.
I've got an extremely paried-end bisulfite-sequencing file (more than 1TB). I used bismark to map it to mm10. Unfortunately, the bismark finished running with a broken SAM file (the last line is incomplete).
It's quite a trouble to run the mapper again coz it cost about 3 weeks. I've checked that about 20% reads were not written into the SAM file. So now I have to find which reads are missing and map that small part again. The difficult part is that, reads in the SAM file are not as the same order as those in the FASTQ file.
What is the fastest way to compare the read IDs in the broken SAM file with IDs in the original FASTQ file? Or is there an alternative way to achieve it?
Really appreciate if anyone can help.
I assume FASTQ file has sorted IDs. Now convert your SAM into BAM and sort it using queryname. This way you can kind of make out where in the fastq file aligner went wrong.