Question: Problem With Sam To Bam Converison After Alignment Of Scaffolds To Reference
0
gravatar for Rohit
5.8 years ago by
Rohit1.4k
California
Rohit1.4k wrote:

Hello.

I'm trying to align the denovo assembly scaffolds I have generated to the available reference genome. For this I have tried using bwasw and the alignment runs fine. But I have a problem when I use samtools to convert the sam files into bam inorder to check the statistics.

I always get the following error with the commands

***[samopen] SAM header is present: 26 sequences.***
***Parse error at line 27: sequence and quality are inconsistent***
***Aborted (core dumped)***

1) samtools import Pan_troglodytes.fa.fai ref_aln_scaf.sam ref_aln_scaf.bam

2) samtools view -bS ref_aln_scaf.sam ref_aln_scaf.bam

My sam file looks like this -

scaffold1    4    *    0    0    *    *    0    0    55278387   0          -   100     #DOWN 68667525:5:56 68667525   127        +   556     #DOWN 63995574:5:107 60040051:2:45     #UP 64591229:1:67 55278387:5:56 63995574   761        -   197     #DOWN 56702323:1:-16     #UP 68667525:5:107 60040051:2:-1     *
scaffold2    4    *    0    0    *    *    0    0    55278447   0          -   100     #DOWN 67543234:3:73     #UP 61919621:2:27 67543234   144        +   405     #DOWN 50747758:4:94 48984289:1:7     #UP 55278447:3:73 50747758   614        -   61     #UP 67543234:4:94 49120931:1:131     *
scaffold3    4    *    0    0    *    *    0    0    55278537   0          +   100     #DOWN 58208681:4:63 52225285:1:127     #UP 51228731:1:62 58208681   134        -   104     #UP 63945980:1:-21 55278537:4:63     *
scaffold4    4    *    0    0    *    *    0    0    55278605   0          -   100     #DOWN 65985219:3:2     #UP 64872087:1:3 65985219   73         +   279     #DOWN 61387005:4:56 49180224:1:145     #UP 64872087:1:98 61790643:1:-22 55278605:3:2 61387005   379        +   147     #DOWN 53307077:2:155 50718052:1:26 49180224:1:61 48106164:1:112     #UP 65985219:4:56     *
scaffold5    4    *    0    0    *    *    0    0    55278615   0          +   100     #DOWN 60423174:3:74     #UP 47914359:1:129 60423174   145        +   131     #DOWN 66863269:1:15     #UP 55278615:3:74 47472933:1:118     *

I have tried using the -t option to include index too while converting but the same error is displayed. I have tried the following commands too -

samtools view -bt Pan_troglodytes.fa.fai -S ref_aln_scaf.sam -o ref_aln_scaf.bam
samtools view -bt Pan_troglodytes.fa.fai ref_aln_scaf.sam > ref_aln_scaf.bam

What is the problem here?

alignment samtools sam bam • 3.1k views
ADD COMMENTlink modified 5.8 years ago by Devon Ryan92k • written 5.8 years ago by Rohit1.4k

It says that the sequence and quality are inconsistent. So check the file for line 27, the quality string is and sequence seems to not match

ADD REPLYlink written 5.8 years ago by Istvan Albert ♦♦ 81k

The problem is I don't have any quality... I aligned fasta sequences only against the reference

ADD REPLYlink written 5.8 years ago by Rohit1.4k

you would need to have a star in the quality column, do you have that?

ADD REPLYlink written 5.8 years ago by Istvan Albert ♦♦ 81k

Really sorry for the late reply Istvan (crashed again)... The 5th and the 11th column for the scaffold alignment has a 0, do I replace the whole 11th column with a star???

I replaced the 11th column with a star and now I have another error

[sam_read1] reference 'scaffold3 4 * 0 0 * * 0 0 55278537 * + 100 #DOWN 58208681:4:63 52225285:1:127 #UP 51228731:1:62 58208681 134 - 104 #UP 63945980:1:-21 55278537:4:63 *' is recognized as '*'.
Parse error at line 27: invalid CIGAR character
ADD REPLYlink modified 5.8 years ago by Istvan Albert ♦♦ 81k • written 5.8 years ago by Rohit1.4k

You might want to post the commands you used to generate the "SAM" files, since they aren't in SAM format...

ADD REPLYlink written 5.8 years ago by Devon Ryan92k

I must say, now I feel really stupid

bwa index -p Pan_tro_bwaidx -a bwtsw Pan_troglodytes.fa

and - 1) bwa bwasw Pan_tro_bwaidx clint_79_k29.scaf > ref_aln_scaf.sam

[2nd try] 2) bwa bwasw clint_79_k29.scaf Pan_troglodytes.fa -f aln_ref

ADD REPLYlink modified 5.8 years ago by Devon Ryan92k • written 5.8 years ago by Rohit1.4k

Try:

bwa bwasw Pan_troglodytes.fa clint_79_k29.scaf > ref_aln_scaf.sam
ADD REPLYlink written 5.8 years ago by Devon Ryan92k

I tried the command above...When I ran sam tools samtools view -bS ref_aln_scaf.sam > ref_aln_scaf.bam

I end up with the inconsistent quality error again. When I replace the 11th column of "0"s to "*"s, then tried to convert to bam, I end up with the invalid CIGAR character error again...

This seems to be a vicious cycle :(

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by Rohit1.4k

Please update your original post with an excerpt of the SAM file.

ADD REPLYlink written 5.8 years ago by Devon Ryan92k

It is not a cycle - the CIGAR string is also invalid, perhaps your entire SAM file is.

Post a few lines of the SAM file (after the header)

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by Istvan Albert ♦♦ 81k

Your SAM file is invalid, what is with all those #DOWN #UP words?

ADD REPLYlink written 5.8 years ago by Istvan Albert ♦♦ 81k

I updated the Sam file with the lines following the headers. I thought #UP and #DOWN words were from the sam output.

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by Rohit1.4k
1

This does not seem to be a valid SAM file (see the SAM documentation)

QNAME   FLAG  RNAME  POS    MAPQ    CIGAR  RNEXT  TNEXT  SEQ      QUAL     ?   ?    ?
scaffold1 4    *      0      0       *      *       0     0     55278387   0   -   100
ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by Istvan Albert ♦♦ 81k

Nope, those have no meaning in a SAM file. What you updated with isn't SAM, I really haven't a clue how you got them.

ADD REPLYlink written 5.8 years ago by Devon Ryan92k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1706 users visited in the last hour