Question: SOAP aligner removes samfile headers
0
gravatar for fiona.newberry
2.6 years ago by
fiona.newberry80 wrote:

Hi,

I am using samtools to convert the sam file outputs from SOAP aligner to bam file.

However, the samfiles do not have any sort of headers. Whenever I try to convert sam -> bam, an error message appears because the file is missing the headers

[W::sam_read1] parse error at line 1
[main_samview] truncated file.

This is the SOAP aligner script:

soap -D combine_reference.fa.index -a Mock_Run/seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz -b Mock_Run/seqtk_1/subsample_1/new_sub_NC_001539_2.fq.gz -o Mock_Output/NC_001539_mapped.sam -2 Mock_Output/NC_001539_unpair.sam -p 3

This is the samtools script:

samtools view -bS NC_001539_mapped.sam > NC_001539_mapped.bam

This is what the head of the sam file looks like:

simulated.A.A.1.1217859/1   
ACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATATGTCAAGAATTGTAACTTACTCAGA    IIGBIG?I9I:AI;IIIFIHFEI@IIBII@FIFH=II@AG?IIEEC>HIHHIICGIEIGIBHIEGIIAIGIGEGFGEIGHFCIIEHIIIHIIHHIHHHHI    1   a   100 -   gi|9627186|ref|NC_001539.1| 5036    0   100M    100
    simulated.A.A.1.1217859/2   GTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGAT    HIHHIHHIHIHHHHIIIFGGIIHIIGGIIIEGIFEHHAHICHI@C@IIGICHHCIGFIGIIDGIGIIABIFDIIEDCEGIII@IC<BI=7IEII;II9?I    1   b   100 +   gi|9627186|ref|NC_001539.1| 4713    1   G->58T7 100M    58G41
    simulated.A.A.1.5536771/1   AAGACCTGAACATACACAACCAATAAGAGACAGAATGTTGAACATTAAGTTAGTATGTAAGCTTCCAGGAGACTTTGGTTTGGTTGATAAAGAAGAATGG    HHIHIHHHIIIIHIHGIIIIIGIIEIIGHFIFICIIDBDIIIGIGIIFHIIBIHIIBGEIIIFIIGIIDIIBII?IEGICDGIDA<BIGIDCHIIIDDIH    1   a   100 +   gi|9627186|ref|NC_001539.1| 1768    0   100M    100
    simulated.A.A.1.5536771/2   CTGACTCCGGACGTAGTGGACCTTGCACTGGAACCGTGGAGTACTCCAGATACGCCTATTGCAGAAACTGCAAATCAACAATCAAACCAACTTGGCGTTA    2DIICID4IFIIIH;BI1FG=I=>IICIIIIAIIIIHI@IH@IIFEIGIICIIGHEDIEFIIEFIGEIGIIIIIHIFIIIFIHIHIIGHGIHIIIHIIHH    1   b   100 -   gi|9627186|ref|NC_001539.1| 2069    0   100M    100
    simulated.A.A.1.7964812/1   TTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAG    HHHIHHGHHGIIHGFHIEEIIIHIFGHIHIHBIIIIIEDDIIIHIIFIIG7AIFIII;HIHIIIFI@IAGIIIIIIE>ID8IIIFII@IGBI;=>9II6I    1   a   100 +   gi|9627186|ref|NC_001539.1| 4702    1   G->69T7 100M    69G30
    simulated.A.A.1.7964812/2   CATCGTATACTGTCTATAAGGTGAACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATAT    G3EI<IIII@0AIIIIIIAII@IIGHI>I@=IH@HBI@II>IIF@IGIGFIIHFIBFDEDIDDGEFIIIHIIIIGIIHHIHHHIIHIIHHIIIIIIIHHI    1   b   100 -   gi|9627186|ref|NC_001539.1| 5012    1   C->1A7  100M    1C98

It is completely missing the headers that are normally found in samfiles.

Is this a problem with the aligner? How can I get the samfile with the correct headers?

Any help would be appreciated.

samfile soap • 922 views
ADD COMMENTlink modified 2.6 years ago by Macspider3.1k • written 2.6 years ago by fiona.newberry80
samtools view -bS

If you don't use -h in your samtools command, it hardly will keep the header.

  -h       include header in SAM output
  -H       print SAM header only (no alignments)

Or is this problem beyond this?

ADD REPLYlink written 2.6 years ago by Macspider3.1k

Soap is outputting the file in ASCII text format so using -h with samtools wouldn't bring a header because there isn't a header to begin with (Found all this out after posting the question).

The problem is with the soap output file

ADD REPLYlink written 2.6 years ago by fiona.newberry80

They suggest to use the soap2sam.pl script that is downloadable at their own homepage: http://soap.genomics.org.cn/down/soap2sam.tar.gz

Did you use it?

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Macspider3.1k

I have attempted to use this. For some reason my output file is different from what they say it should be. This is the format they say it should be:

http://soap.genomics.org.cn/soap1/#Formatofoutput

This is the format of my file:

simulated.A.A.1.1217859/1   ACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATATGTCAAGAATTGTAACTTACTCAGA    IIGBIG?I9I:AI;IIIFIHFEI@IIBII@FIFH=II@AG?IIEEC>HIHHIICGIEIGIBHIEGIIAIGIGEGFGEIGHFCIIEHIIIHIIHHIHHHHI    1   a   100 -   gi|9627186|ref|NC_001539.1| 5036    0   100M    100
    simulated.A.A.1.1217859/2   GTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGAT    HIHHIHHIHIHHHHIIIFGGIIHIIGGIIIEGIFEHHAHICHI@C@IIGICHHCIGFIGIIDGIGIIABIFDIIEDCEGIII@IC<BI=7IEII;II9?I    1   b   100 +   gi|9627186|ref|NC_001539.1| 4713    1   G->58T7 100M    58G41
    simulated.A.A.1.5536771/1   AAGACCTGAACATACACAACCAATAAGAGACAGAATGTTGAACATTAAGTTAGTATGTAAGCTTCCAGGAGACTTTGGTTTGGTTGATAAAGAAGAATGG    HHIHIHHHIIIIHIHGIIIIIGIIEIIGHFIFICIIDBDIIIGIGIIFHIIBIHIIBGEIIIFIIGIIDIIBII?IEGICDGIDA<BIGIDCHIIIDDIH    1   a   100 +   gi|9627186|ref|NC_001539.1| 1768    0   100M    100
    simulated.A.A.1.5536771/2   CTGACTCCGGACGTAGTGGACCTTGCACTGGAACCGTGGAGTACTCCAGATACGCCTATTGCAGAAACTGCAAATCAACAATCAAACCAACTTGGCGTTA    2DIICID4IFIIIH;BI1FG=I=>IICIIIIAIIIIHI@IH@IIFEIGIICIIGHEDIEFIIEFIGEIGIIIIIHIFIIIFIHIHIIGHGIHIIIHIIHH    1   b   100 -   gi|9627186|ref|NC_001539.1| 2069    0   100M    100
    simulated.A.A.1.7964812/1   TTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAG    HHHIHHGHHGIIHGFHIEEIIIHIFGHIHIHBIIIIIEDDIIIHIIFIIG7AIFIII;HIHIIIFI@IAGIIIIIIE>ID8IIIFII@IGBI;=>9II6I    1   a   100 +   gi|9627186|ref|NC_001539.1| 4702    1   G->69T7 100M    69G30
    simulated.A.A.1.7964812/2   CATCGTATACTGTCTATAAGGTGAACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATAT    G3EI<IIII@0AIIIIIIAII@IIGHI>I@=IH@HBI@II>IIF@IGIGFIIHFIBFDEDIDDGEFIIIHIIIIGIIHHIHHHIIHIIHHIIIIIIIHHI    1   b   100 -   gi|9627186|ref|NC_001539.1| 5012    1   C->1A7  100M    1C98
    simulated.A.A.1.3374391/1   GATAACCAATTTAACTATGTACCAAGTAATATTGGAGGTATGAAAATTGTATATGAAAAATCTCAACTAGCACCTAGAAAATTATATTAACATACTTACT    IHIIIHHIHHHHHGHGIIGHHIIIHHFDIEHEFFI@IHEHIIICIBIE<AHIGFIDFGGCIIIID@<IGIIBD:FIIIIIIHIH=IIA7IIICIB6IEII    1   a   100 +   gi|9627186|ref|NC_001539.1| 4451    0   100M    100
    simulated.A.A.1.3374391/2   ACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGT    HI7:?CIHHIH=II;I<@BDII7@<IFCG?DIIFBIIHFGIEGIFDIIIIICI@FCIEFGIIFDFDEHIFGGHGHIEIIDHIHHGIIHHIIHIGIIHHII    1   b   100 -   gi|9627186|ref|NC_001539.1| 4785    1   T->48G9 100M    48T51
    simulated.A.A.1.4550806/1   CCTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGT    (I1IIIIHIIIEIIIGGI@<IIICACCBICICDHIFCIIIFIIFEGGCII@IIIIFIIGEFCFBHIGEIFIIIGHIIIHHFIIHIGIHIIIHIHHIHHII    1   a   100 -   gi|9627186|ref|NC_001539.1| 4785    2   A->0C-24    T->48G9 100M    0A47T51
    simulated.A.A.1.4550806/2   AGTAATATTGGAGGTATGAAAATTGTATATGAAAAATCTCAACTAGCACCTAGAAAATTATATTAACATACTTACTATGGTTTTTATGTTTATTACATAT    IHHIIGHHIHIHIHIIHHFHIFGHDIFIIIIBAIFEGBHIFFIDDEGIICDBIHII@IHIDIICIIIIIIII;GIGDFIIDII@IIIIII@7IAII8?EI    1   b   100 +   gi|9627186|ref|NC_001539.1| 4475    0   100M    100

If I use soap2sam.pl, the program produces a file that is ASCII text (not a sam file).

I have trawled through their website and multiple forums looking for the answer and can't find it anywhere.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by fiona.newberry80
2
gravatar for Macspider
2.6 years ago by
Macspider3.1k
Vienna - BOKU
Macspider3.1k wrote:

Found this: https://github.com/BGI-shenzhen/soap2bam

If you use -Dict and specify a reference, it will produce the header.

ADD COMMENTlink written 2.6 years ago by Macspider3.1k

Thank you but this is using soap2bam, which I don't have.

I don't have admin rights to add this onto my terminal

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by fiona.newberry80

Download it on a folder that you own, run chmod 755 on that file, and you will have those permissions.

ADD REPLYlink written 2.6 years ago by Macspider3.1k
1

Awesome, thank you for all the help!!

ADD REPLYlink written 2.6 years ago by fiona.newberry80

Mark it as answer, and close the thread, then! ;) Matteo

ADD REPLYlink written 2.6 years ago by Macspider3.1k

@fiona can't but I have moved your comment to an answer. It can now be accepted (green-check mark).

ADD REPLYlink written 2.6 years ago by genomax87k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1775 users visited in the last hour