I am trying to map reads onto a 270 Mb reference genome using bwa-mem. I took the ILLUMINA (2x150 bp) paired-end sequencing reads (~65 Gb .fastq files) and trimmed them using Trimmomatic 0.38 and then I mapped those trimmed sequences (~59 Gb .fastq files) onto the reference genome:
nohup bwa mem -M -t 60 -R "@RG\tID:3058874d-77f5-45a9-b511-69cf528d859e\tLB:lib1\tPL:ILLUMINA\tSM:KMM2\tPU:unit1\tPI:239" ~/Reference_genome.fasta KMM2_output_forward_paired_trimmed.fastq KMM2_output_reverse_paired_trimmed.fastq > KMM2_alignment.sam
This resulted in a 136 Gb sequence alignment .sam file. Then I marked duplicates using samblaster 0.1.24 resulting in a 136 Gb .sam file.
cat KMM2_alignment.sam | samblaster > KMM2_alignment_dupsmarked.sam
samblaster: Loaded 894 header sequence entries. samblaster: Marked 3370494 of 174296648 (1.93%) read ids as duplicates using 381692k memory in 7M42S(462.225S) CPU seconds and 6H8M31S(22111S) wall time.
Next I converted .sam to .bam resulting in a 30 Gb .bam file:
nohup samtools view -bS -h -@ 50 KMM2_alignment_dupsmarked.sam > KMM2.bam &
[samopen] SAM header is present: 894 sequences.
But when I went to sort the bam files:
samtools sort -@ 32 KMM2_09252018.bam KMM2_sorted.bam
I got this error:
[bam_header_read] invalid BAM binary header (this is not a BAM file). Segmentation fault (core dumped)
Please help. I read forums and tried to add the header in and read stuff that maybe my alignment .bam file is truncated. What did I do wrong?