Hello.
I'm trying to align the denovo assembly scaffolds I have generated to the available reference genome. For this I have tried using bwasw and the alignment runs fine. But I have a problem when I use samtools to convert the sam files into bam inorder to check the statistics.
I always get the following error with the commands
***[samopen] SAM header is present: 26 sequences.***
***Parse error at line 27: sequence and quality are inconsistent***
***Aborted (core dumped)***
1) samtools import Pan_troglodytes.fa.fai ref_aln_scaf.sam ref_aln_scaf.bam
2) samtools view -bS ref_aln_scaf.sam ref_aln_scaf.bam
My sam file looks like this -
scaffold1 4 * 0 0 * * 0 0 55278387 0 - 100 #DOWN 68667525:5:56 68667525 127 + 556 #DOWN 63995574:5:107 60040051:2:45 #UP 64591229:1:67 55278387:5:56 63995574 761 - 197 #DOWN 56702323:1:-16 #UP 68667525:5:107 60040051:2:-1 *
scaffold2 4 * 0 0 * * 0 0 55278447 0 - 100 #DOWN 67543234:3:73 #UP 61919621:2:27 67543234 144 + 405 #DOWN 50747758:4:94 48984289:1:7 #UP 55278447:3:73 50747758 614 - 61 #UP 67543234:4:94 49120931:1:131 *
scaffold3 4 * 0 0 * * 0 0 55278537 0 + 100 #DOWN 58208681:4:63 52225285:1:127 #UP 51228731:1:62 58208681 134 - 104 #UP 63945980:1:-21 55278537:4:63 *
scaffold4 4 * 0 0 * * 0 0 55278605 0 - 100 #DOWN 65985219:3:2 #UP 64872087:1:3 65985219 73 + 279 #DOWN 61387005:4:56 49180224:1:145 #UP 64872087:1:98 61790643:1:-22 55278605:3:2 61387005 379 + 147 #DOWN 53307077:2:155 50718052:1:26 49180224:1:61 48106164:1:112 #UP 65985219:4:56 *
scaffold5 4 * 0 0 * * 0 0 55278615 0 + 100 #DOWN 60423174:3:74 #UP 47914359:1:129 60423174 145 + 131 #DOWN 66863269:1:15 #UP 55278615:3:74 47472933:1:118 *
I have tried using the -t option to include index too while converting but the same error is displayed. I have tried the following commands too -
samtools view -bt Pan_troglodytes.fa.fai -S ref_aln_scaf.sam -o ref_aln_scaf.bam
samtools view -bt Pan_troglodytes.fa.fai ref_aln_scaf.sam > ref_aln_scaf.bam
What is the problem here?
It says that the sequence and quality are inconsistent. So check the file for line 27, the quality string is and sequence seems to not match
The problem is I don't have any quality... I aligned fasta sequences only against the reference
you would need to have a star in the quality column, do you have that?
Really sorry for the late reply Istvan (crashed again)... The 5th and the 11th column for the scaffold alignment has a 0, do I replace the whole 11th column with a star???
I replaced the 11th column with a star and now I have another error
You might want to post the commands you used to generate the "SAM" files, since they aren't in SAM format...
I must say, now I feel really stupid
and - 1)
bwa bwasw Pan_tro_bwaidx clint_79_k29.scaf > ref_aln_scaf.sam
[2nd try] 2)
bwa bwasw clint_79_k29.scaf Pan_troglodytes.fa -f aln_ref
Try:
I tried the command above...When I ran sam tools samtools view -bS ref_aln_scaf.sam > ref_aln_scaf.bam
I end up with the inconsistent quality error again. When I replace the 11th column of "0"s to "*"s, then tried to convert to bam, I end up with the invalid CIGAR character error again...
This seems to be a vicious cycle :(
Please update your original post with an excerpt of the SAM file.
It is not a cycle - the CIGAR string is also invalid, perhaps your entire SAM file is.
Post a few lines of the SAM file (after the header)
Your SAM file is invalid, what is with all those
#DOWN
#UP
words?I updated the Sam file with the lines following the headers. I thought #UP and #DOWN words were from the sam output.
This does not seem to be a valid SAM file (see the SAM documentation)
Nope, those have no meaning in a SAM file. What you updated with isn't SAM, I really haven't a clue how you got them.