Question: Error when trying to use samtools to convert a sam files to a bam file
0
gravatar for Bioradical
2.7 years ago by
Bioradical50
United States
Bioradical50 wrote:

 I am attempting to convert a SAM file that I obtained as output from Bowtie, in to a BAM file so that I can use it in MACS2. (This is a input file).

Here is the command line I use when attempting to do this:

samtools view -bS /Users/Carlos/Desktop/Enhancer.Project/HCT116_Enhancers/Input_HCT116.sam > Input_HCT116.bam

 

However, I continually get this error (I re-aligned the fastq files twice to make sure this wasn't a bowtie error, and the downstream sam to bam conversion error was the same):

[W::sam_read1] parse error at line 28

[main_samview] truncated file.

 

Here is what the data looks like:

Carlos$ samtools view -H /Users/Carlos/Desktop/Enhancer.Project/HCT116_Enhancers/Input_HCT116.sam 

@HD    VN:1.0    SO:unsorted

@SQ    SN:chr1    LN:249250621

@SQ    SN:chr2    LN:243199373

@SQ    SN:chr3    LN:198022430

@SQ    SN:chr4    LN:191154276

@SQ    SN:chr5    LN:180915260

@SQ    SN:chr6    LN:171115067

@SQ    SN:chr7    LN:159138663

@SQ    SN:chr8    LN:146364022

@SQ    SN:chr9    LN:141213431

@SQ    SN:chr10    LN:135534747

@SQ    SN:chr11    LN:135006516

@SQ    SN:chr12    LN:133851895

@SQ    SN:chr13    LN:115169878

@SQ    SN:chr14    LN:107349540

@SQ    SN:chr15    LN:102531392

@SQ    SN:chr16    LN:90354753

@SQ    SN:chr17    LN:81195210

@SQ    SN:chr18    LN:78077248

@SQ    SN:chr19    LN:59128983

@SQ    SN:chr20    LN:63025520

@SQ    SN:chr21    LN:48129895

@SQ    SN:chr22    LN:51304566

@SQ    SN:chrX    LN:155270560

@SQ    SN:chrY    LN:59373566

@SQ    SN:chrM    LN:16571

@PG    ID:Bowtie    VN:1.1.2    CL:"bowtie --wrapper basic-0 -t /Users/Carlos/Downloads/hg19.ebwt/hg19 -S -m 1 -p 4 -q /Users/Carlos/Desktop/Enhancer.Project/HCT116_Enhancers/Input_S1_L001_R1_001.fastq,/Users/Carlos/Desktop/Enhancer.Project/HCT116_Enhancers/Input_S1_L002_R1_001.fastq,/Users/Carlos/Desktop/Enhancer.Project/HCT116_Enhancers/Input_S1_L003_R1_001.fastq,/Users/Carlos/Desktop/Enhancer.Project/HCT116_Enhancers/Input_S1_L004_R1_001.fastq,"

I've attempted to solve this, and looked online but could find nothing that helps me understand what the error is and how to troubleshoot it. Can anyone help?

samtools • 4.9k views
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Bioradical50
0
gravatar for harold.smith.tarheel
2.7 years ago by
United States
harold.smith.tarheel4.2k wrote:

Which version of SAMtools are you using? The syntax for the latest version is:

samtools view -b -o OUTPUT.bam INPUT.sam
ADD COMMENTlink written 2.7 years ago by harold.smith.tarheel4.2k

I get the same error when running that command. This is my version:

 

carloss-imac:scripts Carlos$ samtools --version

samtools 1.2

Using htslib 1.2.1

Copyright (C) 2015 Genome Research Ltd.
ADD REPLYlink written 2.7 years ago by Bioradical50
3

How many lines are your SAM file? (command: wc -l INPUT.sam)

And what is the problematic line 28? (command: sed -n 28p INPUT.sam) 

 

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by harold.smith.tarheel4.2k

Sorry, didn't have access to data during the weekend. Here's the results:

 

19286734 /Users/Carlos/Desktop/Enhancer.Project/HCT116_Enhancers/Input_HCT116.sam

NS500579:38:HC3MLBGXX:1:11101:2698:1069    4    *    0    0    *    *    0    AATGCNGTCATCACAGGAAACATTCTGAGAATGCTTCTGTCTAGGTTTG    AA6AA#EEEEEEEEEEEEEEEEEEEEEAEE/EEEEEEEEEEEEEAEEEE    XM:i:1
ADD REPLYlink written 2.7 years ago by Bioradical50

Is it possible that the workstation I am doing this on simply can't handle the bowtie alignment and is not working? I'm running a iMAC with 8gb of RAM. 

 

I ask because when I do head -50, I end up with something like this:

 

@HD    VN:1.0    SO:unsorted

@SQ    SN:chr1    LN:249250621

@SQ    SN:chr2    LN:243199373

@SQ    SN:chr3    LN:198022430

@SQ    SN:chr4    LN:191154276

@SQ    SN:chr5    LN:180915260

@SQ    SN:chr6    LN:171115067

@SQ    SN:chr7    LN:159138663

@SQ    SN:chr8    LN:146364022

@SQ    SN:chr9    LN:141213431

@SQ    SN:chr10    LN:135534747

@SQ    SN:chr11    LN:135006516

@SQ    SN:chr12    LN:133851895

@SQ    SN:chr13    LN:115169878

@SQ    SN:chr14    LN:107349540

@SQ    SN:chr15    LN:102531392

@SQ    SN:chr16    LN:90354753

@SQ    SN:chr17    LN:81195210

@SQ    SN:chr18    LN:78077248

@SQ    SN:chr19    LN:59128983

@SQ    SN:chr20    LN:63025520

@SQ    SN:chr21    LN:48129895

@SQ    SN:chr22    LN:51304566

@SQ    SN:chrX    LN:155270560

@SQ    SN:chrY    LN:59373566

@SQ    SN:chrM    LN:16571

@PG    ID:Bowtie    VN:1.1.2    CL:"bowtie --wrapper basic-0 -t /Users/Carlos/Downloads/hg19.ebwt/hg19 -S -m 1 -p 4 -q /Users/Carlos/Desktop/Enhancer.Project/HCT116_Enhancers/SRR2229188 _1__1.fastq,/Users/Carlos/Desktop/Enhancer.Project/HCT116_Enhancers/SRR2229189_1.fastq,/Users/Carlos/Desktop/Enhancer.Project/HCT116_Enhancers/SRR2229190_1.fastq,/Users/Carlos/Desktop/Enhancer.Project/HCT116_Enhancers/SRR2229191_1.fastq,"

SRR2229188    4    *    0    0    *    *    0    0    AATGCNGTCATCACAGGAAACATTCTGAGAATGCTTCTGTCTAGGTTTGN    AA6AA#EEEEEEEEEEEEEEEEEEEEEAEE/EEEEEEEEEEEEEAEEEE#    XM:i:1

SRR2229188    4    *    0    0    *    *    0    0    TTAGANCTACATTCTCTAATATTTATAAATGATATCACATTGTGCCTGNN    AA6AA#EEEEEEEEEEEAEEEEEEEEEEEEEEEEEEE<EEEEEEE6EE##    XM:i:1

SRR2229188    4    *    0    0    *    *    0    0    GTGCTNTGCTTTTAGATATGCATACACATAAACATCTCAATGCTTTACAN    AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE#    XM:i:1

SRR2229188    4    *    0    0    *    *    0    0    CCGTCNCTACTAAAAAAAAATACAAACAATTAGCCAGGCATGGTGGCAGG    AAAAA#AAEAEEEEEEEEEEEEEEEEEEEEEEEAEEEEEAEEE//EA/AE    XM:i:1

X?<?XT<?gg??+?SRR2229188    4    *    0    0    *    *    0    0    CCCCANCGCGGCCCTGAGCTTCCCGCGCCCCCACCGCTGCCCTGAGCTTC    AAAAA#EEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEE    XM:i:1

X?<?XT<?gg??+?SRR2229188    4    *    0    0    *    *    0    0    GGATANAGAGTCAAGACGTATCAGTGTGCTGTATTCAGGAAACCCATCTC    AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEE    XM:i:1

SRR2229188    4    *    0    0    *    *    0    0    AAACANAAAAAAATTAGTACTAAGCTTGAATGTACTTCCCACAGAAGGCN    AAAAA#EEEEEEEEEE6EEEEEEE/EEAEEAEE//EEEAAE/E/E/EE/#    XM:i:0

SRR2229188    4    *    0    0    *    *    0    0    TACTCNTAAAACTAGGCGGCTATGGTATAATACGCCTCACACTCATTCTC    /AAAA#EAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEE    XM:i:1

SRR2229188    4    *    0    0    *    *    0    0    CAAGGNCAAGTCTTTTTTTTTTTTTTTTTTTTTAATATGAGGGCAAGCAC    6AAAA#EEEAEEEAEEEEEEEAEE/EAEE/<A//////////////////    XM:i:0

SRR2229188    4    *    0    0    *    *    0    0    TCTATNTGTAGTATCTGGAAGTGGACATTTGGAGGGCTTTGTAGCCTATG    AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE    XM:i:1

?#?­

    ?J?#??n

8??? p??=?'??M*? p??p

??M*??M*<??=?,?=?0+??(?>?/,??n

80?T??0p?v

          =?00?? p??p

/??0???a

=?,w

    =?X?<?X<?vV???0+??(?>?/,??n

80?T??0p?v

          =?00?? p??p

/??0???a

=?,w

    =?SRR2229188    4    *    0    0    *    *    0    0    GTATCNTCTGGCAAGCATAGGGGACTGCAGTCGACAATGCTGCTGANNNN    6AAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEE####    XM:i:1

?#?­

    ?J?#??n

8??? p??=?'??M*? p??p

??M*??M*<??=?,?=?X?<?X<?vV????#?­

                                 ?J?#??n

8??? p??=?'??M*? p??p

??M*??M*<??=?,?=?X?<?X<?vV???0+??(?>?/,??n

80?T??0p?v

          =?00?? p??p

/??0???a

=?,w

 

And I somehow feel that all those ?'s should not be there.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Bioradical50

This SAM is different from the one shown above. It's difficult to help you troubleshoot if the information you provide is not consistent.

 

This file is full of errors. It appears that Bowtie failed. The specs suggest that memory should be sufficient, but you may want to repeat using only one core. Also, the syntax does not match the version of Bowtie that you're using (1.1.2). Check the manual. Finally, you may also need to rebuild the reference index if was generated from an earlier version (I'm not sure on this last point). There are also instructions for building the index on low memory machines.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by harold.smith.tarheel4.2k

The problematic line 28 in the earlier file does not match the SAMtools specifications (see here). It's missing a required field (8 or 9).

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by harold.smith.tarheel4.2k

Sorry, I am aware that the files are different because I attempted to re-do the bowtie and then sam to bam conversion again. Both files had these same errors. So they must both be failing at the bowtie step. Let me try your suggestions (the syntax might definitely be a problem since I'm simply following a protocol from our prior bioinformatician since i'm basically just a wet lab research tech and it's been an uphill battle to say the least). The index should be pre-built I simply downloaded it. I will post another comment when it's done. Thank you for the help so far!

ADD REPLYlink written 2.7 years ago by Bioradical50

 

From the Bowtie2 website:

"Are Bowtie 2 and Bowtie 1 genome indexes compatible?

No. Bowtie 2 indexes are formatted differently. Bowtie 1 indexes do not work with Bowtie 2 and Bowtie 2 indexes do not work with Bowtie 1."

Unless you are positive that the index was generated with the same version that you're using, you should rebuild it. That also allows you to take advantage of the low memory option.

ADD REPLYlink written 2.7 years ago by harold.smith.tarheel4.2k

I'm positive that the index was built for bowtie, since I used bowtie's prebuilt options. I've run bowtie before and have never run into these problems so I have no idea what's wrong. I went ahead and redid everything using the different syntax, and redownloaded the prebuilt index, and using one processor, but I have the same error on the same line.

ADD REPLYlink written 2.7 years ago by Bioradical50

1) Please post the command that you used in your most recent attempt to run Bowtie.

2) Please post the exact error you received.

3) If you've successfully run Bowtie before on the identical computer using the identical index and identical command line (except for a different input), then the input files are the problem. If your previous Bowtie experience was not with the identical computer/index/command line, then any of these variables might be the culprit.
 

ADD REPLYlink written 2.7 years ago by harold.smith.tarheel4.2k
0
gravatar for Ian
2.7 years ago by
Ian5.2k
University of Manchester, UK
Ian5.2k wrote:

MACS does not only accept SAM files.  It will accept "BAM" or "BAMPE" if it is paired-end data.

If "samtools index" runs to completion using the BAM file then the file should be OK.  You could also run "samtools flagstat" to make sure the number of total reads is what you expect.

ADD COMMENTlink written 2.7 years ago by Ian5.2k

I am aware that MACS will accept SAM or BAM files. The problem is that all my other files already have BAM files, and only my input data refuses to convert from SAM to BAM. The resulting BAM file is only 687 bytes so I doubt that it's okay to use when the SAM file is over 5gb large.

ADD REPLYlink written 2.7 years ago by Bioradical50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 478 users visited in the last hour