Question: Truncated sam file - Parse error
0
gravatar for fiona.newberry
2.1 years ago by
fiona.newberry80 wrote:

I am trying to convert sam to bam using samtools view -bS IN.sam > OUT.bam

I get the following error:

[W::sam_read1] parse error at line 36
[main_samview] truncated file.

Line 36 is this:

=====> Processing read 'simulated.2618103'/1 <=====

There are no errors when BWA is running. I have read the sam file format specifications document and nothing is mentioned about lines starting with =====>

Here is from the beginning of the file to line 40:

@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10    LN:135534747
@SQ SN:chr11    LN:135006516
@SQ SN:chr12    LN:133851895
@SQ SN:chr13    LN:115169878
@SQ SN:chr14    LN:107349540
@SQ SN:chr15    LN:102531392
@SQ SN:chr16    LN:90354753
@SQ SN:chr17    LN:81195210
@SQ SN:chr18    LN:78077248
@SQ SN:chr19    LN:59128983
@SQ SN:chr20    LN:63025520
@SQ SN:chr21    LN:48129895
@SQ SN:chr22    LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:gi|9627186|ref|NC_001539.1|  LN:5323
@SQ SN:gi|9629818|ref|NC_001847.1|  LN:135301
@SQ SN:gi|315192962|ref|NC_002306.3|    LN:29355
@SQ SN:gi|21728357|ref|NC_004067.1| LN:6450
@SQ SN:gi|38018060|ref|NC_005148.1| LN:1768
@SQ SN:gi|352950882|ref|NC_011507.2|    LN:3302
@SQ SN:gi|303291528|ref|NC_014406.1|    LN:4926
@SQ SN:gi|311977355|ref|NC_014649.1|    LN:1181549
@SQ SN:gi|448259945|ref|NC_019925.1|    LN:152427
@PG ID:bwa  PN:bwa  VN:0.7.12-r1039 CL:bwa mem -v4 combine_reference.fa.gz seqtk_1/subsample_1/sub_NC_001539_1.fq.gz seqtk_1/subsample_1/sub_NC_001539_2.fq.gz
=====> Processing read 'simulated.2618103'/1 <=====
* fraction of repetitive seeds: 0.000
* Found CHAIN(0): n=3; weight=81    20;20;0,3095694383(gi|9627186|ref|NC_001539.1|:+401)    29;29;0,3095694383(gi|9627186|ref|NC_001539.1|:+401)    52;52;30,3095694413(gi|9627186|ref|NC_001539.1|:+431)
* Found CHAIN(1): n=1; weight=19    19;19;76,1952705833(chr12:+1774857)
* Found CHAIN(2): n=1; weight=19    19;19;40,770774837(chr4:+80285843)

Lines 1-35 seem to fit with the sam file format specifications

Any help would be appreciated.

bwa sam bam • 3.3k views
ADD COMMENTlink modified 2.1 years ago by ATpoint24k • written 2.1 years ago by fiona.newberry80

How did you run BWA? It seems stderr is making its way into your SAM file.

ADD REPLYlink written 2.1 years ago by h.mon27k

I agree, it looks as though you're redirecting both stdout and stderr to the same file.

ADD REPLYlink written 2.1 years ago by James Ashmore2.7k

That's what I was thinking but I can't seem to spot how

This is the code I used for BWA....

  bwa mem -v4 combine_reference.fa.gz seqtk_1/subsample_1/sub_NC_001847_1.fq.gz seqtk_1/subsample_1/sub_NC_001847_2.fq.gz  > ./BWA/seqtk_1/subsample_1/sub_NC_001847_BWA.sam

I may have made a really stupid mistake and just not seeing it

ADD REPLYlink written 2.1 years ago by fiona.newberry80

I don't see nothing wrong with your command-line, but I don't think using -v4 is needed. Are you running BWA directly, or inside some script or pipeline? Is stderr being redirected somewhere? Is BWA running as a background process?

How did you create the index? What is the size of your reference?

ADD REPLYlink written 2.1 years ago by h.mon27k

I guess you are running this command via nohup and thought this was not relevant?

ADD REPLYlink written 2.1 years ago by Michael Dondrup46k

You may try to align each read file (sub_NC_001539_1.fq.gz or sub_NC_001539_2.fq.gz, not both) and merge the two bam files using Picard.

ADD REPLYlink written 2.1 years ago by genebow150

BWA is well able to handle paired-end information. Splitting is not necessary and would result in a loss of the insert size information, which then would need to be included in a second step after the merge. If you handle big files, that will take ages.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by ATpoint24k
0
gravatar for ATpoint
2.1 years ago by
ATpoint24k
Germany
ATpoint24k wrote:

Do yourself a favor and avoid outputting SAM files. There is no advantage of saving the SAMs, it only wastes disk space. Directly pipe the aligner to SAMtools view to get the binary file. It seems that you have to re-align anyway, so do:

bwa mem -M ref 1.fq 2.fq | samtools view -bhS -o out.bam

In case you need sorted files, which is almost always the case, you can also pipe into sort right away:

bwa mem -M ref 1.fq 2.fq | samtools sort -o out_sorted.bam
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by ATpoint24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1612 users visited in the last hour