Problem With Sam To Bam Converison After Alignment Of Scaffolds To Reference
0
0
Entering edit mode
10.3 years ago
Rohit ★ 1.5k

Hello.

I'm trying to align the denovo assembly scaffolds I have generated to the available reference genome. For this I have tried using bwasw and the alignment runs fine. But I have a problem when I use samtools to convert the sam files into bam inorder to check the statistics.

I always get the following error with the commands

***[samopen] SAM header is present: 26 sequences.***
***Parse error at line 27: sequence and quality are inconsistent***
***Aborted (core dumped)***

1) samtools import Pan_troglodytes.fa.fai ref_aln_scaf.sam ref_aln_scaf.bam

2) samtools view -bS ref_aln_scaf.sam ref_aln_scaf.bam

My sam file looks like this -

scaffold1    4    *    0    0    *    *    0    0    55278387   0          -   100     #DOWN 68667525:5:56 68667525   127        +   556     #DOWN 63995574:5:107 60040051:2:45     #UP 64591229:1:67 55278387:5:56 63995574   761        -   197     #DOWN 56702323:1:-16     #UP 68667525:5:107 60040051:2:-1     *
scaffold2    4    *    0    0    *    *    0    0    55278447   0          -   100     #DOWN 67543234:3:73     #UP 61919621:2:27 67543234   144        +   405     #DOWN 50747758:4:94 48984289:1:7     #UP 55278447:3:73 50747758   614        -   61     #UP 67543234:4:94 49120931:1:131     *
scaffold3    4    *    0    0    *    *    0    0    55278537   0          +   100     #DOWN 58208681:4:63 52225285:1:127     #UP 51228731:1:62 58208681   134        -   104     #UP 63945980:1:-21 55278537:4:63     *
scaffold4    4    *    0    0    *    *    0    0    55278605   0          -   100     #DOWN 65985219:3:2     #UP 64872087:1:3 65985219   73         +   279     #DOWN 61387005:4:56 49180224:1:145     #UP 64872087:1:98 61790643:1:-22 55278605:3:2 61387005   379        +   147     #DOWN 53307077:2:155 50718052:1:26 49180224:1:61 48106164:1:112     #UP 65985219:4:56     *
scaffold5    4    *    0    0    *    *    0    0    55278615   0          +   100     #DOWN 60423174:3:74     #UP 47914359:1:129 60423174   145        +   131     #DOWN 66863269:1:15     #UP 55278615:3:74 47472933:1:118     *

I have tried using the -t option to include index too while converting but the same error is displayed. I have tried the following commands too -

samtools view -bt Pan_troglodytes.fa.fai -S ref_aln_scaf.sam -o ref_aln_scaf.bam
samtools view -bt Pan_troglodytes.fa.fai ref_aln_scaf.sam > ref_aln_scaf.bam

What is the problem here?

sam bam alignment samtools • 4.4k views
ADD COMMENT
0
Entering edit mode

It says that the sequence and quality are inconsistent. So check the file for line 27, the quality string is and sequence seems to not match

ADD REPLY
0
Entering edit mode

The problem is I don't have any quality... I aligned fasta sequences only against the reference

ADD REPLY
0
Entering edit mode

you would need to have a star in the quality column, do you have that?

ADD REPLY
0
Entering edit mode

Really sorry for the late reply Istvan (crashed again)... The 5th and the 11th column for the scaffold alignment has a 0, do I replace the whole 11th column with a star???

I replaced the 11th column with a star and now I have another error

[sam_read1] reference 'scaffold3 4 * 0 0 * * 0 0 55278537 * + 100 #DOWN 58208681:4:63 52225285:1:127 #UP 51228731:1:62 58208681 134 - 104 #UP 63945980:1:-21 55278537:4:63 *' is recognized as '*'.
Parse error at line 27: invalid CIGAR character
ADD REPLY
0
Entering edit mode

You might want to post the commands you used to generate the "SAM" files, since they aren't in SAM format...

ADD REPLY
0
Entering edit mode

I must say, now I feel really stupid

bwa index -p Pan_tro_bwaidx -a bwtsw Pan_troglodytes.fa

and - 1) bwa bwasw Pan_tro_bwaidx clint_79_k29.scaf > ref_aln_scaf.sam

[2nd try] 2) bwa bwasw clint_79_k29.scaf Pan_troglodytes.fa -f aln_ref

ADD REPLY
0
Entering edit mode

Try:

bwa bwasw Pan_troglodytes.fa clint_79_k29.scaf > ref_aln_scaf.sam
ADD REPLY
0
Entering edit mode

I tried the command above...When I ran sam tools samtools view -bS ref_aln_scaf.sam > ref_aln_scaf.bam

I end up with the inconsistent quality error again. When I replace the 11th column of "0"s to "*"s, then tried to convert to bam, I end up with the invalid CIGAR character error again...

This seems to be a vicious cycle :(

ADD REPLY
0
Entering edit mode

Please update your original post with an excerpt of the SAM file.

ADD REPLY
0
Entering edit mode

It is not a cycle - the CIGAR string is also invalid, perhaps your entire SAM file is.

Post a few lines of the SAM file (after the header)

ADD REPLY
0
Entering edit mode

Your SAM file is invalid, what is with all those #DOWN #UP words?

ADD REPLY
0
Entering edit mode

I updated the Sam file with the lines following the headers. I thought #UP and #DOWN words were from the sam output.

ADD REPLY
1
Entering edit mode

This does not seem to be a valid SAM file (see the SAM documentation)

QNAME   FLAG  RNAME  POS    MAPQ    CIGAR  RNEXT  TNEXT  SEQ      QUAL     ?   ?    ?
scaffold1 4    *      0      0       *      *       0     0     55278387   0   -   100
ADD REPLY
0
Entering edit mode

Nope, those have no meaning in a SAM file. What you updated with isn't SAM, I really haven't a clue how you got them.

ADD REPLY

Login before adding your answer.

Traffic: 2084 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6