Parsing error in sam file when converting to bam
3
0
Entering edit mode
12 days ago

I am working with single-cell sequencing files and am getting stuck when converting sam to bam. Any help is sincerely appreciated:

This is my initial cutadapt command:

cutadapt -a CTGTCTCTTATACACATCT -A CTGTCTCTTATACACATCT --interleaved --cores=16 -o project/outs/fastq_path/project/PTO035/PTO035_1.fastq.gz -p project/outs/fastq_path/project/PTO035/PTO035_2.fastq.gz project/outs/fastq_path/project/PTO035/PTO035_S1_L001_R1_001.fastq.gz project/outs/fastq_path/project/PTO035/PTO035_S1_L001_R2_001.fastq.gz

This is my command for converting to sam:

bwa mem -C -M -t 16 genome.fa <(zcat project/outs/fastq_path/project/PTO035/PTO035_1.fastq.gz) <(zcat project/outs/fastq_path/project/PTO035/PTO035_2.fastq.gz) > aln_pe3.sam 

When using this command to convert to bam, I get an error:

samtools view -Sb aln_pe3.sam > aln_pe3.bam

Error:

[W::sam_read1_sam] Parse error at line 196
samtools view: error reading file "aln_pe3.sam"

If it helps, this is what line 196 looks like: 
SRR23292069.1   99  chr10   68742898    60  101M    =   68742899    102 GTGAGACACTGCGCCTGGCCAGTGTAGCTATGATTTTCTTTCCTTTTTTATTTGAGACAGAGCCTCACTCGGTTGCCAAGGCTGGAGTGCAGTGGCGTGAT   ?????????????????????????????????????????????????????????????????????????????????????????????????????   NM:i:0  MD:Z:101    MC:Z:101M   AS:i:101    XS:i:37 A00521:358:HWTVNDSX3:1:1101:1669:1000/1
samtools bwa fastq bam sam • 675 views
ADD COMMENT
0
Entering edit mode

There is no need for this process substitution zcat thing, bwa can read gzipped files.

ADD REPLY
1
Entering edit mode
11 days ago

Have you already tried running the command like this ?

bwa mem -C -M -t 16 genome.fa project/outs/fastq_path/project/PTO035/PTO035_1.fastq.gz project/outs/fastq_path/project/PTO035/PTO035_2.fastq.gz > aln_pe3.sam

Is there a reason to use zcat when bwa can work on compressed fq file ?

ADD COMMENT
0
Entering edit mode

When I tried that command I got the same error. What ended up working was removing the -C parameter from the bwa command altogether, because I read the CIGAR info was generated by default even without it.

ADD REPLY
1
Entering edit mode
5 days ago
jkbonfield ★ 1.3k

Your SAM output is indeed incorrect:

SRR23292069.1   99  chr10  [...cut...]   XS:i:37 A00521:358:HWTVNDSX3:1:1101:1669:1000/1

You can see it mysteriously ends on a read name, so this is invalid aux tag format. The bwa mem man page has this to say about the -C option:

 -C     Append append FASTA/Q comment to SAM output. This option can be used to transfer read meta information (e.g. barcode) to the SAM output. Note that the FASTA/Q comment (the string after a space in the header line) must conform the SAM spec (e.g. BC:Z:CGTAC). Malformated comments lead to incorrect SAM output.

The last sentence "Malformated comments lead to incorrect SAM output" is indeed triggering samtools to fail. So then it goes one step further up the chain - what caused the FASTQ comments to be invalid?

This is either something wrong with cutadapt, or possibly with the original input fastq files if cutadapt isn't adding comments. Maybe this data simply isn't in the correct format for use with bwa mem -C.

ADD COMMENT
0
Entering edit mode

A00521:358:HWTVNDSX3:1:1101:1669:1000/1

That fastq header format was used by Illumina for data (at least a decade or so ago, prior to release of CASAVA v.1.8, when it changed to current format we are familiar with). /1 indicated data from R1 read.

Oddly SRR23292069 being referred to is a relatively recent (from 2023) submission, so that additional /1 at the end came was likely because of how this data was extracted from SRA.

ADD REPLY
0
Entering edit mode
11 days ago

samtools sort can take the output straight from bwa, and output a bam. So pipe from bwa straight into samtools sort.

ADD COMMENT
0
Entering edit mode

What ended up working was removing the -C parameter from the bwa command altogether, because I read the CIGAR info was generated by default even without it.

ADD REPLY

Login before adding your answer.

Traffic: 3027 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6