I thought I was finally in the stretch but lo and behold a new issue arises. Here is my whole pipeline that I used:
bwa index refgenome.fa bwa mem -B 2 -M refgenome.fa cert1.R1.fastq cert1.R2.fastq > cert1_aln.sam samtools view -Sb cert1_aln.sam > cert1_aln.bam # I do a sort here but maybe I don't need to? I have just seen many pipelines that sort and index at this point so I did it too. samtools sort cert1_aln.bam > cert1_aln_sorted.bam | rm -f cert1_aln.bam samtools index cert1_aln_sorted.bam # Gather some stats samtools flagstat cert1_aln_sorted.bam > cert1_stats.txt # Extract unmapped read whose mate is mapped samtools view -b -f 4 -F 264 cert1_aln_sorted.bam > cert1_tmp1_unmapped.bam # Extract mapped read whose mate is unmapped samtools view -b -f 8 -F 260 cert1_aln_sorted.bam > cert1_tmp2_unmapped.bam # Extract unmapped read with unmapped mate samtools view -b -f 12 -F 256 cert1_aln_sorted.bam > cert1_tmp3_unmapped.bam # Merge the three tmp files into 1 samtools merge cert1_unmapped.bam cert1_tmp*_unmapped.bam # Extract mapped reads from BAM file samtools view -b -F 12 cert1_aln_sorted.bam > cert1_mapped.bam # Sort the BAM files by name samtools sort -n cert1_unmapped.bam > cert1_unmapped_sorted.bam samtools sort -n cert1_mapped.bam > cert1_mapped_sorted.bam # Finally, convert to Fastq bamToFastq -i cert1_unmapped_sorted.bam -fq cert1_unmapped.R1.fastq -fq2 cert1_unmapped.R2.fastq bamToFastq -i cert1_mapped_sorted.bam -fq cert1_mapped.R1.fastq -fq2 cert1_mapped.R2.fastq
The error I get towards the end is:
seqname is marked as paired, but its mate does not occur next to it in your BAM file. Skipping
It spams my terminal and it's endless. My fastq files are tiny and clearly missing sequences. Most biostars posts on this say the issue is with sorting without -n flag, but I tried sorting with it and without it and I get the same error regardless. I just want one fastq file with paired mapped reads, and then another fastq file with all unmapped reads/mates.