Question: Truncated sam file - Parse error
0
gravatar for fiona.newberry
21 months ago by
fiona.newberry80 wrote:

I am trying to convert sam to bam using samtools view -bS IN.sam > OUT.bam

I get the following error:

[W::sam_read1] parse error at line 36
[main_samview] truncated file.

Line 36 is this:

=====> Processing read 'simulated.2618103'/1 <=====

There are no errors when BWA is running. I have read the sam file format specifications document and nothing is mentioned about lines starting with =====>

Here is from the beginning of the file to line 40:

@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10    LN:135534747
@SQ SN:chr11    LN:135006516
@SQ SN:chr12    LN:133851895
@SQ SN:chr13    LN:115169878
@SQ SN:chr14    LN:107349540
@SQ SN:chr15    LN:102531392
@SQ SN:chr16    LN:90354753
@SQ SN:chr17    LN:81195210
@SQ SN:chr18    LN:78077248
@SQ SN:chr19    LN:59128983
@SQ SN:chr20    LN:63025520
@SQ SN:chr21    LN:48129895
@SQ SN:chr22    LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:gi|9627186|ref|NC_001539.1|  LN:5323
@SQ SN:gi|9629818|ref|NC_001847.1|  LN:135301
@SQ SN:gi|315192962|ref|NC_002306.3|    LN:29355
@SQ SN:gi|21728357|ref|NC_004067.1| LN:6450
@SQ SN:gi|38018060|ref|NC_005148.1| LN:1768
@SQ SN:gi|352950882|ref|NC_011507.2|    LN:3302
@SQ SN:gi|303291528|ref|NC_014406.1|    LN:4926
@SQ SN:gi|311977355|ref|NC_014649.1|    LN:1181549
@SQ SN:gi|448259945|ref|NC_019925.1|    LN:152427
@PG ID:bwa  PN:bwa  VN:0.7.12-r1039 CL:bwa mem -v4 combine_reference.fa.gz seqtk_1/subsample_1/sub_NC_001539_1.fq.gz seqtk_1/subsample_1/sub_NC_001539_2.fq.gz
=====> Processing read 'simulated.2618103'/1 <=====
* fraction of repetitive seeds: 0.000
* Found CHAIN(0): n=3; weight=81    20;20;0,3095694383(gi|9627186|ref|NC_001539.1|:+401)    29;29;0,3095694383(gi|9627186|ref|NC_001539.1|:+401)    52;52;30,3095694413(gi|9627186|ref|NC_001539.1|:+431)
* Found CHAIN(1): n=1; weight=19    19;19;76,1952705833(chr12:+1774857)
* Found CHAIN(2): n=1; weight=19    19;19;40,770774837(chr4:+80285843)

Lines 1-35 seem to fit with the sam file format specifications

Any help would be appreciated.

bwa sam bam • 2.5k views
ADD COMMENTlink modified 21 months ago by ATpoint17k • written 21 months ago by fiona.newberry80

How did you run BWA? It seems stderr is making its way into your SAM file.

ADD REPLYlink written 21 months ago by h.mon25k

I agree, it looks as though you're redirecting both stdout and stderr to the same file.

ADD REPLYlink written 21 months ago by James Ashmore2.6k

That's what I was thinking but I can't seem to spot how

This is the code I used for BWA....

  bwa mem -v4 combine_reference.fa.gz seqtk_1/subsample_1/sub_NC_001847_1.fq.gz seqtk_1/subsample_1/sub_NC_001847_2.fq.gz  > ./BWA/seqtk_1/subsample_1/sub_NC_001847_BWA.sam

I may have made a really stupid mistake and just not seeing it

ADD REPLYlink written 21 months ago by fiona.newberry80

I don't see nothing wrong with your command-line, but I don't think using -v4 is needed. Are you running BWA directly, or inside some script or pipeline? Is stderr being redirected somewhere? Is BWA running as a background process?

How did you create the index? What is the size of your reference?

ADD REPLYlink written 21 months ago by h.mon25k

I guess you are running this command via nohup and thought this was not relevant?

ADD REPLYlink written 21 months ago by Michael Dondrup46k

You may try to align each read file (sub_NC_001539_1.fq.gz or sub_NC_001539_2.fq.gz, not both) and merge the two bam files using Picard.

ADD REPLYlink written 21 months ago by genebow150

BWA is well able to handle paired-end information. Splitting is not necessary and would result in a loss of the insert size information, which then would need to be included in a second step after the merge. If you handle big files, that will take ages.

ADD REPLYlink modified 21 months ago • written 21 months ago by ATpoint17k
0
gravatar for ATpoint
21 months ago by
ATpoint17k
Germany
ATpoint17k wrote:

Do yourself a favor and avoid outputting SAM files. There is no advantage of saving the SAMs, it only wastes disk space. Directly pipe the aligner to SAMtools view to get the binary file. It seems that you have to re-align anyway, so do:

bwa mem -M ref 1.fq 2.fq | samtools view -bhS -o out.bam

In case you need sorted files, which is almost always the case, you can also pipe into sort right away:

bwa mem -M ref 1.fq 2.fq | samtools sort -o out_sorted.bam
ADD COMMENTlink modified 21 months ago • written 21 months ago by ATpoint17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 596 users visited in the last hour