Blasr PacBio aligner: Unexpected sam file output
1
0
Entering edit mode
4.0 years ago
akm126 • 0

I am aligning some PacBio single read sequences using Blasr. I was able to successfully align and create a Sam file for some of my sequences, but some report a sam file that begins like the following:

@HD     VN:1.5  SO:UNKNOWN      pb:3.0.1
@SQ     SN:gi/0/0_1667892       LN:1667892      M5:f8a1580a28e068939d886d1c70879a72
@RG     ID:95680781     PL:PACBIO       DS:READTYPE=SUBREAD;DeletionQV=dq;DeletionTag=dt;InsertionQV=iq;MergeQV=mq;SubstitutionQV=sq    PU:/home/akm126/HpGP_Data/HON001_edit.fasta    PM:SEQUEL
@PG     ID:1    PN:BLASR        VN:5.3.3        CL:blasr /home/akm126/HpGP_Data/HON001_edit.fasta /home/akm126/HpGP_Data/HP_bacteria/HP_bacteria.fasta --sam --out alignment1_test.sam
Rabkin_HON001/0/0_166079401     16      gi/0/0_1667892  1751    254     47825S25=1D1=2D1=1D2=3D1=1D1=1D45=2D1=5D1=1D8=1X1=1X39=2D1=3D1=2D46=1X4=6D10=1X44=1D2=2
D1=1D2=1D4=1X29=1X2=1X2=1D1=1I2=1X2=2D2=1D1=1D41=1X1=1X11=3D39=1X11=1X6=2D2=1X55=1D18=1X6=1X43=1I1=1D9=1X1=1X13=1X21=1X24=1X18=2X13=1D21=1X1=1X24= 1X8=1X2D55=2D1=1D54=1I2=3D1=2D5=1X1D5=1X2=1X4=1D1=2I26=1X1=1X1=2D1=1D1=2D49=1X1D2

I can't find any differences in the header between files. Any ideas why the "1D1=1D2" is happening? Thanks.

alignment • 727 views
ADD COMMENT
1
Entering edit mode

A small educational note: I added (code) markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode
4.0 years ago
GenoMax 142k

I can't find any differences in the header between files

What do you mean? If you are aligning to the same set of references the headers are likely going stay (@HD and @SQ lines) identical.

Any ideas why the "1D1=1D2" is happening?

That should be the CIGAR string. 47825S25= a large number of bases at beginning of read match perfectly.

ADD COMMENT
0
Entering edit mode

Hi, thanks for your help. To clarify, here is what the other sam files look like.

@HD     VN:1.5  SO:UNKNOWN      pb:3.0.1
@SQ     SN:unitig_5/0/0_14675   LN:14675        M5:7173f2f30d5793c2c0206670cf23f34f
@SQ     SN:unitig_0|quiver|quiver/0/0_1727168ATGGTGCGAACAAATCTCTACAAAGTTTTGATTTTCACTCAAAAACCCATACCCAGGGAAAATCGCATCGGCTTCAAACAATTCCGCCGCGCTGATGATCGCAGGGATATTCAAGTAACTCTCGCTGGATT
TGGCCCCCCCTATACACACTTTTGCGTTAGCCGTATTGAGGTAGTGGGCGTCCTT

This sam file converts into a VCF no problem, but the above sam file does not. By difference in header, I am referring to the Blasr guidelines here (https://github.com/PacificBiosciences/blasr/wiki/BLASR-Frequently-Asked-Questions )

ADD REPLY
1
Entering edit mode

If I was to hazard a guess it looks like the fasta header for unitig_0|quiver|quiver/0/0_1727168 appears to have sequence in the header itself. Can you check the original fasta file for that sequence? You will need to fix that and re-run the alignment.

I reformatted your example above using code button as shown by @lieven above. Edit as necessary if not correct.

ADD REPLY
0
Entering edit mode

Ah that is right, I see it now. Thank you for your help!

ADD REPLY

Login before adding your answer.

Traffic: 1639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6