Question: SOAP aligner removes samfile headers
0
gravatar for fiona.newberry
20 months ago by
fiona.newberry80 wrote:

Hi,

I am using samtools to convert the sam file outputs from SOAP aligner to bam file.

However, the samfiles do not have any sort of headers. Whenever I try to convert sam -> bam, an error message appears because the file is missing the headers

[W::sam_read1] parse error at line 1
[main_samview] truncated file.

This is the SOAP aligner script:

soap -D combine_reference.fa.index -a Mock_Run/seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz -b Mock_Run/seqtk_1/subsample_1/new_sub_NC_001539_2.fq.gz -o Mock_Output/NC_001539_mapped.sam -2 Mock_Output/NC_001539_unpair.sam -p 3

This is the samtools script:

samtools view -bS NC_001539_mapped.sam > NC_001539_mapped.bam

This is what the head of the sam file looks like:

simulated.A.A.1.1217859/1   
ACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATATGTCAAGAATTGTAACTTACTCAGA    IIGBIG?I9I:AI;IIIFIHFEI@IIBII@FIFH=II@AG?IIEEC>HIHHIICGIEIGIBHIEGIIAIGIGEGFGEIGHFCIIEHIIIHIIHHIHHHHI    1   a   100 -   gi|9627186|ref|NC_001539.1| 5036    0   100M    100
    simulated.A.A.1.1217859/2   GTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGAT    HIHHIHHIHIHHHHIIIFGGIIHIIGGIIIEGIFEHHAHICHI@C@IIGICHHCIGFIGIIDGIGIIABIFDIIEDCEGIII@IC<BI=7IEII;II9?I    1   b   100 +   gi|9627186|ref|NC_001539.1| 4713    1   G->58T7 100M    58G41
    simulated.A.A.1.5536771/1   AAGACCTGAACATACACAACCAATAAGAGACAGAATGTTGAACATTAAGTTAGTATGTAAGCTTCCAGGAGACTTTGGTTTGGTTGATAAAGAAGAATGG    HHIHIHHHIIIIHIHGIIIIIGIIEIIGHFIFICIIDBDIIIGIGIIFHIIBIHIIBGEIIIFIIGIIDIIBII?IEGICDGIDA<BIGIDCHIIIDDIH    1   a   100 +   gi|9627186|ref|NC_001539.1| 1768    0   100M    100
    simulated.A.A.1.5536771/2   CTGACTCCGGACGTAGTGGACCTTGCACTGGAACCGTGGAGTACTCCAGATACGCCTATTGCAGAAACTGCAAATCAACAATCAAACCAACTTGGCGTTA    2DIICID4IFIIIH;BI1FG=I=>IICIIIIAIIIIHI@IH@IIFEIGIICIIGHEDIEFIIEFIGEIGIIIIIHIFIIIFIHIHIIGHGIHIIIHIIHH    1   b   100 -   gi|9627186|ref|NC_001539.1| 2069    0   100M    100
    simulated.A.A.1.7964812/1   TTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAG    HHHIHHGHHGIIHGFHIEEIIIHIFGHIHIHBIIIIIEDDIIIHIIFIIG7AIFIII;HIHIIIFI@IAGIIIIIIE>ID8IIIFII@IGBI;=>9II6I    1   a   100 +   gi|9627186|ref|NC_001539.1| 4702    1   G->69T7 100M    69G30
    simulated.A.A.1.7964812/2   CATCGTATACTGTCTATAAGGTGAACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATAT    G3EI<IIII@0AIIIIIIAII@IIGHI>I@=IH@HBI@II>IIF@IGIGFIIHFIBFDEDIDDGEFIIIHIIIIGIIHHIHHHIIHIIHHIIIIIIIHHI    1   b   100 -   gi|9627186|ref|NC_001539.1| 5012    1   C->1A7  100M    1C98

It is completely missing the headers that are normally found in samfiles.

Is this a problem with the aligner? How can I get the samfile with the correct headers?

Any help would be appreciated.

samfile soap • 691 views
ADD COMMENTlink modified 20 months ago by Macspider2.9k • written 20 months ago by fiona.newberry80
samtools view -bS

If you don't use -h in your samtools command, it hardly will keep the header.

  -h       include header in SAM output
  -H       print SAM header only (no alignments)

Or is this problem beyond this?

ADD REPLYlink written 20 months ago by Macspider2.9k

Soap is outputting the file in ASCII text format so using -h with samtools wouldn't bring a header because there isn't a header to begin with (Found all this out after posting the question).

The problem is with the soap output file

ADD REPLYlink written 20 months ago by fiona.newberry80

They suggest to use the soap2sam.pl script that is downloadable at their own homepage: http://soap.genomics.org.cn/down/soap2sam.tar.gz

Did you use it?

ADD REPLYlink modified 20 months ago • written 20 months ago by Macspider2.9k

I have attempted to use this. For some reason my output file is different from what they say it should be. This is the format they say it should be:

http://soap.genomics.org.cn/soap1/#Formatofoutput

This is the format of my file:

simulated.A.A.1.1217859/1   ACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATATGTCAAGAATTGTAACTTACTCAGA    IIGBIG?I9I:AI;IIIFIHFEI@IIBII@FIFH=II@AG?IIEEC>HIHHIICGIEIGIBHIEGIIAIGIGEGFGEIGHFCIIEHIIIHIIHHIHHHHI    1   a   100 -   gi|9627186|ref|NC_001539.1| 5036    0   100M    100
    simulated.A.A.1.1217859/2   GTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGAT    HIHHIHHIHIHHHHIIIFGGIIHIIGGIIIEGIFEHHAHICHI@C@IIGICHHCIGFIGIIDGIGIIABIFDIIEDCEGIII@IC<BI=7IEII;II9?I    1   b   100 +   gi|9627186|ref|NC_001539.1| 4713    1   G->58T7 100M    58G41
    simulated.A.A.1.5536771/1   AAGACCTGAACATACACAACCAATAAGAGACAGAATGTTGAACATTAAGTTAGTATGTAAGCTTCCAGGAGACTTTGGTTTGGTTGATAAAGAAGAATGG    HHIHIHHHIIIIHIHGIIIIIGIIEIIGHFIFICIIDBDIIIGIGIIFHIIBIHIIBGEIIIFIIGIIDIIBII?IEGICDGIDA<BIGIDCHIIIDDIH    1   a   100 +   gi|9627186|ref|NC_001539.1| 1768    0   100M    100
    simulated.A.A.1.5536771/2   CTGACTCCGGACGTAGTGGACCTTGCACTGGAACCGTGGAGTACTCCAGATACGCCTATTGCAGAAACTGCAAATCAACAATCAAACCAACTTGGCGTTA    2DIICID4IFIIIH;BI1FG=I=>IICIIIIAIIIIHI@IH@IIFEIGIICIIGHEDIEFIIEFIGEIGIIIIIHIFIIIFIHIHIIGHGIHIIIHIIHH    1   b   100 -   gi|9627186|ref|NC_001539.1| 2069    0   100M    100
    simulated.A.A.1.7964812/1   TTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAG    HHHIHHGHHGIIHGFHIEEIIIHIFGHIHIHBIIIIIEDDIIIHIIFIIG7AIFIII;HIHIIIFI@IAGIIIIIIE>ID8IIIFII@IGBI;=>9II6I    1   a   100 +   gi|9627186|ref|NC_001539.1| 4702    1   G->69T7 100M    69G30
    simulated.A.A.1.7964812/2   CATCGTATACTGTCTATAAGGTGAACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATAT    G3EI<IIII@0AIIIIIIAII@IIGHI>I@=IH@HBI@II>IIF@IGIGFIIHFIBFDEDIDDGEFIIIHIIIIGIIHHIHHHIIHIIHHIIIIIIIHHI    1   b   100 -   gi|9627186|ref|NC_001539.1| 5012    1   C->1A7  100M    1C98
    simulated.A.A.1.3374391/1   GATAACCAATTTAACTATGTACCAAGTAATATTGGAGGTATGAAAATTGTATATGAAAAATCTCAACTAGCACCTAGAAAATTATATTAACATACTTACT    IHIIIHHIHHHHHGHGIIGHHIIIHHFDIEHEFFI@IHEHIIICIBIE<AHIGFIDFGGCIIIID@<IGIIBD:FIIIIIIHIH=IIA7IIICIB6IEII    1   a   100 +   gi|9627186|ref|NC_001539.1| 4451    0   100M    100
    simulated.A.A.1.3374391/2   ACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGT    HI7:?CIHHIH=II;I<@BDII7@<IFCG?DIIFBIIHFGIEGIFDIIIIICI@FCIEFGIIFDFDEHIFGGHGHIEIIDHIHHGIIHHIIHIGIIHHII    1   b   100 -   gi|9627186|ref|NC_001539.1| 4785    1   T->48G9 100M    48T51
    simulated.A.A.1.4550806/1   CCTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGT    (I1IIIIHIIIEIIIGGI@<IIICACCBICICDHIFCIIIFIIFEGGCII@IIIIFIIGEFCFBHIGEIFIIIGHIIIHHFIIHIGIHIIIHIHHIHHII    1   a   100 -   gi|9627186|ref|NC_001539.1| 4785    2   A->0C-24    T->48G9 100M    0A47T51
    simulated.A.A.1.4550806/2   AGTAATATTGGAGGTATGAAAATTGTATATGAAAAATCTCAACTAGCACCTAGAAAATTATATTAACATACTTACTATGGTTTTTATGTTTATTACATAT    IHHIIGHHIHIHIHIIHHFHIFGHDIFIIIIBAIFEGBHIFFIDDEGIICDBIHII@IHIDIICIIIIIIII;GIGDFIIDII@IIIIII@7IAII8?EI    1   b   100 +   gi|9627186|ref|NC_001539.1| 4475    0   100M    100

If I use soap2sam.pl, the program produces a file that is ASCII text (not a sam file).

I have trawled through their website and multiple forums looking for the answer and can't find it anywhere.

ADD REPLYlink modified 20 months ago • written 20 months ago by fiona.newberry80
2
gravatar for Macspider
20 months ago by
Macspider2.9k
Vienna - BOKU
Macspider2.9k wrote:

Found this: https://github.com/BGI-shenzhen/soap2bam

If you use -Dict and specify a reference, it will produce the header.

ADD COMMENTlink written 20 months ago by Macspider2.9k

Thank you but this is using soap2bam, which I don't have.

I don't have admin rights to add this onto my terminal

ADD REPLYlink modified 20 months ago • written 20 months ago by fiona.newberry80

Download it on a folder that you own, run chmod 755 on that file, and you will have those permissions.

ADD REPLYlink written 20 months ago by Macspider2.9k
1

Awesome, thank you for all the help!!

ADD REPLYlink written 20 months ago by fiona.newberry80

Mark it as answer, and close the thread, then! ;) Matteo

ADD REPLYlink written 20 months ago by Macspider2.9k

@fiona can't but I have moved your comment to an answer. It can now be accepted (green-check mark).

ADD REPLYlink written 20 months ago by genomax71k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 715 users visited in the last hour