Sam To Bam Conversion
1
2
Entering edit mode
10.1 years ago
Sajan Franco ▴ 20

BWA Version : 0.7.0-r313

Picard Version 1.108

samtools Version : 0.1.19-44428cd

1.I have created the bam index files for human_hg19_full.fasta using bwa -a bwtsw human_hg19_full.fasta comand

This creates the following files

a. human_hg19_full.fasta           Size (3199905534)
b. human_hg19_full.fasta.amb    Size (8606) 
c. human_hg19_full.fasta.ann     Size (3762) 
d. human_hg19_full.fasta.bwt     Size (3137161344) 
e. human_hg19_full.fasta.pac     Size (784290317) 
f. human_hg19_full.fasta.sa        Size (1568580680)

2.Created the fai using samtools faidx command ./samtools faidx /kfs/babraham/benchmarks/Align/latest/human_hg19_full.fasta which creates the following file

a. human_hg19_full.fasta.fai     Size (3261)

3.Created the dict file using the picard tool using the following command java -Xmx48g -jar /kfs/aligncall/picard-tools-1.98/CreateSequenceDictionary.jar REFERENCE=/kfs/babraham/benchmarks/Align/latest/human_hg19_full.fasta OUTPUT=/kfs/babraham/benchmarks/Align/latest/human_hg19_full.dict which creates the dict file

a. human_hg19_full.dict        Size (12318)

4.run align and sampe using the following command:

./bwa mem -t 32 -R "@RG\tID:NA12878_30x_pairend\tSM:NA12878_30x\tLB:NA12878_30x\tPL:Illumina" /kfs/babraham/benchmarks/Align/human_hg19_full.fasta \
/kfs/aligncall/data/illumina_platinum/NA12878_30x_2-pe1.fastq.gz \
/kfs/aligncall/data/illumina_platinum/NA12878_30x_2-pe2.fastq.gz \
/kfs/babraham/benchmarks/Align/NA12878_30x_2-pe.sam

which creates the NA12878_30x_2-pe.sam created successfully.

5.cat NA12878_30x_2-pe.sam

@SQ     SN:1    LN:249250621
@SQ     SN:2    LN:243199373
@SQ     SN:3    LN:198022430
@SQ     SN:4    LN:191154276
@SQ     SN:5    LN:180915260
@SQ     SN:6    LN:171115067
@SQ     SN:7    LN:159138663
@SQ     SN:8    LN:146364022
@SQ     SN:9    LN:141213431
@SQ     SN:10   LN:135534747
@SQ     SN:11   LN:135006516
@SQ     SN:12   LN:133851895
...
@SQ     SN:Un_gl000249  LN:38502
@RG     ID:NA12878_30x_pairend  SM:NA12878_30x  LB:NA12878_30x  PL:Illumina

6.Execute samtools view command mentioned below to convert above sam file to bam format.

samtools view -bhS /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.sam > /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.bam

on executing the above command I am getting the following exception mentioned below

[root@kdev01 samtools-0.1.19]# ./samtools view -bhS /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.sam > /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.bam
[samopen] SAM header is present: 93 sequences.
[sam_read1] reference 'ID:NA12878_30x_pairend SM:NA12878_30x LB:NA12878_30x PL:Illumina
' is recognized as '*'.
[main_samview] truncated file.
bam sam • 4.9k views
ADD COMMENT
3
Entering edit mode

I've taken the liberty of reformatting things a bit. I'm pretty sure I kept all of the context, but let me know or go ahead and change things if not.

Also, I saved the original header as foo.sam and samtools view -Sbo foo.bam foo.sam works fine. The error message indicates that samtools thinks the @RG line is a read, though I don't know how it got that idea. You might post the first 1000 lines somewhere (not here) so that we have a reproducible example to look at.

ADD REPLY
2
Entering edit mode

wait... your file is just the sam header? The last line is @RG ID:NA12878_30x_pairend SM:NA12878_30x LB:NA12878_30x PL:Illumina?

ADD REPLY
0
Entering edit mode

Looks like the headers are only present in the SAM file

When I run bwa aln and bwa sampe as separate commands on a different machine, the SAM file that is getting does have information further to this.

@SQ     SN:Un_gl000248  LN:39786
@SQ     SN:Un_gl000249  LN:38502
@RG     ID:wo237-sg5114_lane02_pairend1 SM:sg5114       LB:sg5114       PL:Illumina
HWI-D00309:50:C30AEACXX:2:1101:1377:2125        99      1       111778320       60      100M    =       111778408       188     GACTGCGGGCGTATCTGCAGGGAGGCAAATGATTGATAACAGCTATCAAGTTGAGAAACTGGCAAAGTGAGTACATCAGAGCAACTTTCCATCCCTCTGC    @?@FFDFFHAF?FGBHHGCEHF7?<B@>8BF8@@7).>EDHEEHCCEEEA@@@D?2;@CCC9=CCACCCC@C>>@BCC(:A9>9??@ACCDCCBBCCCCA    RG:Z:wo237-sg5114_lane02_pairend1       XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
HWI-D00309:50:C30AEACXX:2:1101:1377:2125        147     1       111778408       60      100M    =       111778320       -188    CCATCCCTCTGCTTCCAATTTGGTCCATGCAAATGATGCATCAGCATGTTTGACGGATGAGGGGAGGAGGAAGAGAGGGAAGGTCATTGTCCTAACCCTG    <<58BBA<93=>5>;5.(?;=.3<ECC=?=;;=777))G@4@)7)=B3=@DD??99?)?0::3?99GC19:C3GB@;GADEEGBE>HDFC8?ADADD?<@    RG:Z:wo237-sg5114_lane02_pairend1       XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1
ADD REPLY
1
Entering edit mode

ok.... let's take a step back.

1) Why does your mapping spit out only the header ? 2) Why do you care about compressing a mere header ?

ADD REPLY
0
Entering edit mode

Looks like this might be a resource problem with the machine.

The machine which runs bwa mem has only 8 GB of RAM. I have increased this to 16 GB and I am re-executing bwa mem command

ADD REPLY
0
Entering edit mode
10.1 years ago
Sajan Franco ▴ 20

This issue got resolved after the RAM was increased to 16 GB for the grid node from 8GB, the possible reason being bwa mem is not very efficient with multiple threads due to frequent heap memory allocations.

The other thing that I did was to pipe the bwa to sam so the intermediate sam file does not get written explicitly

./bwa mem mem -t 32 -R "@RG\tID:NA12878_30x_pairend\tSM:NA12878_30x\tLB:NA12878_30x\tPL:Illumina" /kfs/babraham/benchmarks/Align/human_hg19_full.fasta \
    /kfs/aligncall/data/illumina_platinum/NA12878_30x_2-pe1.fastq.gz \
    /kfs/aligncall/data/illumina_platinum/NA12878_30x_2-pe2.fastq.gz \
    |  samtools view -bhS - > /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.bam
ADD COMMENT

Login before adding your answer.

Traffic: 2331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6