Question: Sam To Bam Conversion
2
gravatar for Sajan Franco
4.0 years ago by
Sajan Franco20
COK
Sajan Franco20 wrote:

BWA Version : 0.7.0-r313

Picard Version 1.108

samtools Version : 0.1.19-44428cd

1.I have created the bam index files for human_hg19_full.fasta using bwa -a bwtsw human_hg19_full.fasta comand

This creates the following files

a. human_hg19_full.fasta           Size (3199905534)
b. human_hg19_full.fasta.amb    Size (8606) 
c. human_hg19_full.fasta.ann     Size (3762) 
d. human_hg19_full.fasta.bwt     Size (3137161344) 
e. human_hg19_full.fasta.pac     Size (784290317) 
f. human_hg19_full.fasta.sa        Size (1568580680)

2.Created the fai using samtools faidx command ./samtools faidx /kfs/babraham/benchmarks/Align/latest/human_hg19_full.fasta which creates the following file

a. human_hg19_full.fasta.fai     Size (3261)

3.Created the dict file using the picard tool using the following command java -Xmx48g -jar /kfs/aligncall/picard-tools-1.98/CreateSequenceDictionary.jar REFERENCE=/kfs/babraham/benchmarks/Align/latest/human_hg19_full.fasta OUTPUT=/kfs/babraham/benchmarks/Align/latest/human_hg19_full.dict which creates the dict file

a. human_hg19_full.dict        Size (12318)

4.run align and sampe using the following command:

./bwa mem -t 32 -R "@RG\tID:NA12878_30x_pairend\tSM:NA12878_30x\tLB:NA12878_30x\tPL:Illumina" /kfs/babraham/benchmarks/Align/human_hg19_full.fasta \
/kfs/aligncall/data/illumina_platinum/NA12878_30x_2-pe1.fastq.gz \
/kfs/aligncall/data/illumina_platinum/NA12878_30x_2-pe2.fastq.gz \
/kfs/babraham/benchmarks/Align/NA12878_30x_2-pe.sam

which creates the NA12878_30x_2-pe.sam created successfully.

5.cat NA12878_30x_2-pe.sam

@SQ     SN:1    LN:249250621
@SQ     SN:2    LN:243199373
@SQ     SN:3    LN:198022430
@SQ     SN:4    LN:191154276
@SQ     SN:5    LN:180915260
@SQ     SN:6    LN:171115067
@SQ     SN:7    LN:159138663
@SQ     SN:8    LN:146364022
@SQ     SN:9    LN:141213431
@SQ     SN:10   LN:135534747
@SQ     SN:11   LN:135006516
@SQ     SN:12   LN:133851895
...
@SQ     SN:Un_gl000249  LN:38502
@RG     ID:NA12878_30x_pairend  SM:NA12878_30x  LB:NA12878_30x  PL:Illumina

6.Execute samtools view command mentioned below to convert above sam file to bam format.

samtools view -bhS /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.sam > /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.bam

on executing the above command I am getting the following exception mentioned below

[root@kdev01 samtools-0.1.19]# ./samtools view -bhS /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.sam > /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.bam
[samopen] SAM header is present: 93 sequences.
[sam_read1] reference 'ID:NA12878_30x_pairend SM:NA12878_30x LB:NA12878_30x PL:Illumina
' is recognized as '*'.
[main_samview] truncated file.
bam sam • 2.3k views
ADD COMMENTlink modified 3.9 years ago • written 4.0 years ago by Sajan Franco20
3

I've taken the liberty of reformatting things a bit. I'm pretty sure I kept all of the context, but let me know or go ahead and change things if not.

Also, I saved the original header as foo.sam and samtools view -Sbo foo.bam foo.sam works fine. The error message indicates that samtools thinks the @RG line is a read, though I don't know how it got that idea. You might post the first 1000 lines somewhere (not here) so that we have a reproducible example to look at.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Devon Ryan76k
2

wait... your file is just the sam header ? The last line is "@RG ID:NA12878_30x_pairend SM:NA12878_30x LB:NA12878_30x PL:Illumina" ?

ADD REPLYlink written 4.0 years ago by Gabriel R.2.2k

Looks like the headers are only present in the SAM file When i run bwa aln and bwa sampe as separate commands on a different machine, the SAM file that is getting does have information further to this.

@SQ     SN:Un_gl000248  LN:39786
@SQ     SN:Un_gl000249  LN:38502
@RG     ID:wo237-sg5114_lane02_pairend1 SM:sg5114       LB:sg5114       PL:Illumina
HWI-D00309:50:C30AEACXX:2:1101:1377:2125        99      1       111778320       60      100M    =       111778408       188     GACTGCGGGCGTATCTGCAGGGAGGCAAATGATTGATAACAGCTATCAAGTTGAGAAACTGGCAAAGTGAGTACATCAGAGCAACTTTCCATCCCTCTGC    @?@FFDFFHAF?FGBHHGCEHF7?<B@>8BF8@@7).>EDHEEHCCEEEA@@@D?2;@CCC9=CCACCCC@C>>@BCC(:A9>9??@ACCDCCBBCCCCA    RG:Z:wo237-sg5114_lane02_pairend1       XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:100
HWI-D00309:50:C30AEACXX:2:1101:1377:2125        147     1       111778408       60      100M    =       111778320       -188    CCATCCCTCTGCTTCCAATTTGGTCCATGCAAATGATGCATCAGCATGTTTGACGGATGAGGGGAGGAGGAAGAGAGGGAAGGTCATTGTCCTAACCCTG    <<58BBA<93=>5>;5.(?;=.3<ECC=?=;;=777))G@4@)7)=B3=@DD??99?)?0::3?99GC19:C3GB@;GADEEGBE>HDFC8?ADADD?<@    RG:Z:wo237-sg5114_lane02_pairend1       XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1
ADD REPLYlink modified 4.0 years ago by Devon Ryan76k • written 4.0 years ago by Sajan Franco20
1

ok.... let's take a step back.

1) Why does your mapping spit out only the header ? 2) Why do you care about compressing a mere header ?

ADD REPLYlink written 4.0 years ago by Gabriel R.2.2k

Looks like this might be a resource problem with the machine.

The machine which runs bwa mem has only 8 GB of RAM. I have increased this to 16 GB and I am re-executing bwa mem command

ADD REPLYlink written 4.0 years ago by Sajan Franco20
0
gravatar for Sajan Franco
3.9 years ago by
Sajan Franco20
COK
Sajan Franco20 wrote:

This issue got resolved after the RAM was increased to 16 GB for the grid node from 8GB, the possible reason being bwa men is not very efficient with mutiple threads due to frequent heap memory allocations.

The other thing that I did was to pipe the bwa to sam so the intermediate sam file does not get written explicitly

./bwa mem mem -t 32 -R "@RG\tID:NA12878_30x_pairend\tSM:NA12878_30x\tLB:NA12878_30x\tPL:Illumina" /kfs/babraham/benchmarks/Align/human_hg19_full.fasta \ /kfs/aligncall/data/illumina_platinum/NA12878_30x_2-pe1.fastq.gz \ /kfs/aligncall/data/illumina_platinum/NA12878_30x_2-pe2.fastq.gz \ | samtools view -bhS - > /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.bam

ADD COMMENTlink written 3.9 years ago by Sajan Franco20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 721 users visited in the last hour