gSNAP aligner generating truncated sam file ?
1
0
Entering edit mode
5.9 years ago
pinn ▴ 210

Hi I tried to align almost more than 10 human genome datasets against the hg38. gsnap able to align, generating a truncated sam file. Can any way, suggest me how to sort it out ?

Command:

./gsnap -d hg38 -D /data/Likith/gmap-2018-05-30/share/hg38 -t 10 /data/shayantan/cancer_samples/sample1/fixed1_normal.fq  /data/shayantan/cancer_samples/sample1/fixed2_normal.fq  > S1.gsnap.sam

Input file sizes:

R1 & R2- 97Gb , 97 Gb
hg38 - 3.1 GB

output file:

S1.gsnap.sam - 410 GB

Viewing the sam file:

$ samtools view S1.testgsnap.sam | head
[W::sam_read1] Parse error at line 1
[main_samview] truncated file
Assembly genome next-gen alignment • 2.1k views
ADD COMMENT
0
Entering edit mode

Try just head S1.testgsnap.sam.

Also, are you sure you want all of the datasets merged together like this? Usually you want them as separate files.

ADD REPLY
0
Entering edit mode

Hi, i'm able to view the SAM file with head that is not a problem. when converting from SAM to sort.bam its showing truncated sam file ? Is their any other way, can i convert to sort.bam ?

$ head S1.gsnap.sam 
>ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAAGATCGGAAG  2 unpaired  CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJIIJJJJJJJJJJJJJJJJIEHHHFFFEEDCEEDDDDDCDDDDDCDDDDCDCCCCCDBDB<   HSQ-700848:338:D168LACXX:1:2307:7299:80233
ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAAccctaaccc   1..92   +chr5:11491..11582  start:0..end:9,matches:92,sub:0 segs:1,align_score:9,mapq:3 method:ext
ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAAccctaaccc   1..92   -chr22:50807994..50807903   start:0..end:9,matches:92,sub:0 segs:1,align_score:9,mapq:3method:ext
<TTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTAGATCGGAAG  1 unpaired  BC@FFFDDFHHHFGGGIJFHGHIJEHHIJJFHHIJJDFDGHIDHHIJJBFHHIJFHIIJHHHEHFFDFCEEE>B=BDD:AABDDABABDDCCBCDDDD?B>   HSQ-700848:338:D168LACXX:1:2307:7299:80233
TTAGGGTTAGGGTTAGGGTTAGGGTTAa-------------------------------------------------------------------------   1..27   -chr18:10187..10161 start:0..del:1,matches:27,sub:0 segs:3,align_score:8,mapq:40    method:ext-gmap
,---------------------------GGGTTAGGGTTAa-------------------------------------------------------------  28..39  -chr18:10159..10148 del:1..del:1,matches:12,sub:0
,---------------------------------------GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTAGggttaggg  40..93  -chr18:10146..10093 del:1..end:8,matches:54,sub:0

>CCCTAACCCTGACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAGATCGGAAGAG  1 unpaired  @@@BDEFFFH?DFADGGGGGEGHIGCEDDGHHGGIGIIGIFEGGGIIIIIGHIIIHIGI@ECEHHC@BEC@EEA=A=ABAAACCB<AACAC>@>8<?<9A<   HSQ-700848:338:D168LACXX:1:2111:5061:50207

Then with samtools:

samtools view -u S1.gsnap.sam | samtools sort -@ 20 -T /tmp/S1.gsnap.sam.sort -o  S1.gsnap.sam.sort.bam
[W::sam_read1] Parse error at line 1
[main_samview] truncated file.
ADD REPLY
0
Entering edit mode

Ah, it's not actually writing a SAM file! Maybe there's an option to change that?

ADD REPLY
3
Entering edit mode
5.9 years ago
h.mon 35k

You have to specify sam as output format with -A sam or --format=sam.

  -A, --format=STRING            Another format type, other than default.
                                 Currently implemented: sam, m8 (BLAST tabular format)
ADD COMMENT
0
Entering edit mode

I'm able to view the output. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2373 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6