sam to bam header not being retained
1
0
Entering edit mode
6.5 years ago

I have a series of sam files that I would like to convert to bam files and merge into one bam file. The downstream analysis that is to be performed on the files requires them to have header lines.

example:

head -n 30 input1/input1.sam
@HD VN:1.0  SO:unsorted
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10    LN:135534747
@SQ SN:chr11    LN:135006516
@SQ SN:chr12    LN:133851895
@SQ SN:chr13    LN:115169878
@SQ SN:chr14    LN:107349540
@SQ SN:chr15    LN:102531392
@SQ SN:chr16    LN:90354753
@SQ SN:chr17    LN:81195210
@SQ SN:chr18    LN:78077248
@SQ SN:chr19    LN:59128983
@SQ SN:chr20    LN:63025520
@SQ SN:chr21    LN:48129895
@SQ SN:chr22    LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@PG ID:bowtie2  PN:bowtie2  VN:2.2.6    CL:"/usr/local/bowtie2/bowtie2-align-s --wrapper basic-0 -x hg19 --very-fast -p 8 -S ChIPH1_dot_2_11-11-14_TTAGGC_L.sam -1 ChIPH1_dot_2_11-11-14_TTAGGC_L003_R1_complete_filtered.fastq -2 ChIPH1_dot_2_11-11-14_TTAGGC_L003_R2_complete_filtered.fastq"
HWI-ST1437:123:C69MTACXX:3:1101:1405:2072   73  chr1    85460210    42  101M    =   85460210    0   NGTTATAGAGATGGCTTCCTTTCTTAAAGCTCATGAACCAACCTCTGCTAGCTTGAAACTTTTCTTCTGCAGCTTCATTACCTCTCTCAGCCTTCACAGAA   #1=DFDFFHGGGHIIJJIIIIJJJJHHCHHIEHGIFHIJJIJJJJJFIIFHIJJJIIGIIGIJHIJIJJJJIIIJHHHHHHHFFFFFBCCDEEDDCDCC?A   AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:0T100  YT:Z:UP
HWI-ST1437:123:C69MTACXX:3:1101:1405:2072   133 chr1    85460210    0   *   =   85460210    0   TGAATGATGATAAAACCAAAGAGGCCTATAGATGATATGGAGAAAGTTTTTGTGGTCTGGATAGAAGATCAAACCAGCACCAACATTCACTAAAGCTAAAG   @@@DDEFFHGHHFIJJJJIJJIJJJIJIIHIHIJJJJIIHIJIJIJJJIJFHHIJIJJJJGHIIIGIGGIJJHHHHFFFFCE9ABDDDEDDDDCCCCDDDD   YT:Z:UP
HWI-ST1437:123:C69MTACXX:3:1101:1140:2073   99  chr20   43940927    40  101M    =   43941039    215 NTGCGGCTGGTGCTGCGCGGGGGCCGGGAGCTGGGTACCTTCCACAGCCGCCTTATCAAGGTCATCTCGAAGCCCTCGCAGAAGAAGCAGTCGCTGAAAAN   #1:BD7A@<?A;DFF<ECEEF>>?B6='65:<38?338A8ABBBBB<@?0<<55A(:(>+++3>:A>@2><89?BBBB@B5@?>?>33994>5>B<9AAA#   AS:i:-2 XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:0C99A0 YS:i:-29    YT:Z:CP

however when I convert it to a bam file using the '-h' argument the header is not retained:

samtools view -h -S -b input1/input1.sam > test.bam


    samtools view test.bam | head 
HWI-ST1437:123:C69MTACXX:3:1101:1405:2072   73  chr1    85460210    42  101M    =   85460210    0   NGTTATAGAGATGGCTTCCTTTCTTAAAGCTCATGAACCAACCTCTGCTAGCTTGAAACTTTTCTTCTGCAGCTTCATTACCTCTCTCAGCCTTCACAGAA   #1=DFDFFHGGGHIIJJIIIIJJJJHHCHHIEHGIFHIJJIJJJJJFIIFHIJJJIIGIIGIJHIJIJJJJIIIJHHHHHHHFFFFFBCCDEEDDCDCC?A   AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:0T100  YT:Z:UP
HWI-ST1437:123:C69MTACXX:3:1101:1405:2072   133 chr1    85460210    0*  =   85460210    0   TGAATGATGATAAAACCAAAGAGGCCTATAGATGATATGGAGAAAGTTTTTGTGGTCTGGATAGAAGATCAAACCAGCACCAACATTCACTAAAGCTAAAG   @@@DDEFFHGHHFIJJJJIJJIJJJIJIIHIHIJJJJIIHIJIJIJJJIJFHHIJIJJJJGHIIIGIGGIJJHHHHFFFFCE9ABDDDEDDDDCCCCDDDD   YT:Z:UP    
.....

Is there something that I'm missing???...

NOTE: after converted each to a bam file I will merge them all into one:

for i in input*
do
    samtools view -h -S -b ${i}/*.sam > ${i}/${i}.bam
done
samtools merge input_formatted.bam -@ 6 input*/*sorted.bam
samtools sort -o input_formatted_sorted.bam -@ 6 input_formatted.bam
sequence • 1.0k views
ADD COMMENT
2
Entering edit mode
6.5 years ago

Is there something that I'm missing???...

by default, the sam header is not printed in sam view. Try

samtools view -h  input1/input1.sam
ADD COMMENT
0
Entering edit mode

or -H for header only.

ADD REPLY

Login before adding your answer.

Traffic: 3155 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6