Hi,
I am using samtools to convert the sam file outputs from SOAP aligner to bam file.
However, the samfiles do not have any sort of headers. Whenever I try to convert sam -> bam, an error message appears because the file is missing the headers
[W::sam_read1] parse error at line 1
[main_samview] truncated file.
This is the SOAP aligner script:
soap -D combine_reference.fa.index -a Mock_Run/seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz -b Mock_Run/seqtk_1/subsample_1/new_sub_NC_001539_2.fq.gz -o Mock_Output/NC_001539_mapped.sam -2 Mock_Output/NC_001539_unpair.sam -p 3
This is the samtools script:
samtools view -bS NC_001539_mapped.sam > NC_001539_mapped.bam
This is what the head of the sam file looks like:
simulated.A.A.1.1217859/1   
ACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATATGTCAAGAATTGTAACTTACTCAGA    IIGBIG?I9I:AI;IIIFIHFEI@IIBII@FIFH=II@AG?IIEEC>HIHHIICGIEIGIBHIEGIIAIGIGEGFGEIGHFCIIEHIIIHIIHHIHHHHI    1   a   100 -   gi|9627186|ref|NC_001539.1| 5036    0   100M    100
    simulated.A.A.1.1217859/2   GTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGAT    HIHHIHHIHIHHHHIIIFGGIIHIIGGIIIEGIFEHHAHICHI@C@IIGICHHCIGFIGIIDGIGIIABIFDIIEDCEGIII@IC<BI=7IEII;II9?I    1   b   100 +   gi|9627186|ref|NC_001539.1| 4713    1   G->58T7 100M    58G41
    simulated.A.A.1.5536771/1   AAGACCTGAACATACACAACCAATAAGAGACAGAATGTTGAACATTAAGTTAGTATGTAAGCTTCCAGGAGACTTTGGTTTGGTTGATAAAGAAGAATGG    HHIHIHHHIIIIHIHGIIIIIGIIEIIGHFIFICIIDBDIIIGIGIIFHIIBIHIIBGEIIIFIIGIIDIIBII?IEGICDGIDA<BIGIDCHIIIDDIH    1   a   100 +   gi|9627186|ref|NC_001539.1| 1768    0   100M    100
    simulated.A.A.1.5536771/2   CTGACTCCGGACGTAGTGGACCTTGCACTGGAACCGTGGAGTACTCCAGATACGCCTATTGCAGAAACTGCAAATCAACAATCAAACCAACTTGGCGTTA    2DIICID4IFIIIH;BI1FG=I=>IICIIIIAIIIIHI@IH@IIFEIGIICIIGHEDIEFIIEFIGEIGIIIIIHIFIIIFIHIHIIGHGIHIIIHIIHH    1   b   100 -   gi|9627186|ref|NC_001539.1| 2069    0   100M    100
    simulated.A.A.1.7964812/1   TTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAG    HHHIHHGHHGIIHGFHIEEIIIHIFGHIHIHBIIIIIEDDIIIHIIFIIG7AIFIII;HIHIIIFI@IAGIIIIIIE>ID8IIIFII@IGBI;=>9II6I    1   a   100 +   gi|9627186|ref|NC_001539.1| 4702    1   G->69T7 100M    69G30
    simulated.A.A.1.7964812/2   CATCGTATACTGTCTATAAGGTGAACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATAT    G3EI<IIII@0AIIIIIIAII@IIGHI>I@=IH@HBI@II>IIF@IGIGFIIHFIBFDEDIDDGEFIIIHIIIIGIIHHIHHHIIHIIHHIIIIIIIHHI    1   b   100 -   gi|9627186|ref|NC_001539.1| 5012    1   C->1A7  100M    1C98
It is completely missing the headers that are normally found in samfiles.
Is this a problem with the aligner? How can I get the samfile with the correct headers?
Any help would be appreciated.
If you don't use
-hin yoursamtoolscommand, it hardly will keep the header.Or is this problem beyond this?
Soap is outputting the file in ASCII text format so using -h with samtools wouldn't bring a header because there isn't a header to begin with (Found all this out after posting the question).
The problem is with the soap output file
They suggest to use the
soap2sam.plscript that is downloadable at their own homepage: http://soap.genomics.org.cn/down/soap2sam.tar.gzDid you use it?
I have attempted to use this. For some reason my output file is different from what they say it should be. This is the format they say it should be:
http://soap.genomics.org.cn/soap1/#Formatofoutput
This is the format of my file:
If I use soap2sam.pl, the program produces a file that is ASCII text (not a sam file).
I have trawled through their website and multiple forums looking for the answer and can't find it anywhere.