Question: sam to bam header not being retained
0
gravatar for chrisclarkson100
3.2 years ago by
European Union
chrisclarkson10090 wrote:

I have a series of sam files that I would like to convert to bam files and merge into one bam file. The downstream analysis that is to be performed on the files requires them to have header lines.

example:

head -n 30 input1/input1.sam
@HD VN:1.0  SO:unsorted
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10    LN:135534747
@SQ SN:chr11    LN:135006516
@SQ SN:chr12    LN:133851895
@SQ SN:chr13    LN:115169878
@SQ SN:chr14    LN:107349540
@SQ SN:chr15    LN:102531392
@SQ SN:chr16    LN:90354753
@SQ SN:chr17    LN:81195210
@SQ SN:chr18    LN:78077248
@SQ SN:chr19    LN:59128983
@SQ SN:chr20    LN:63025520
@SQ SN:chr21    LN:48129895
@SQ SN:chr22    LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@PG ID:bowtie2  PN:bowtie2  VN:2.2.6    CL:"/usr/local/bowtie2/bowtie2-align-s --wrapper basic-0 -x hg19 --very-fast -p 8 -S ChIPH1_dot_2_11-11-14_TTAGGC_L.sam -1 ChIPH1_dot_2_11-11-14_TTAGGC_L003_R1_complete_filtered.fastq -2 ChIPH1_dot_2_11-11-14_TTAGGC_L003_R2_complete_filtered.fastq"
HWI-ST1437:123:C69MTACXX:3:1101:1405:2072   73  chr1    85460210    42  101M    =   85460210    0   NGTTATAGAGATGGCTTCCTTTCTTAAAGCTCATGAACCAACCTCTGCTAGCTTGAAACTTTTCTTCTGCAGCTTCATTACCTCTCTCAGCCTTCACAGAA   #1=DFDFFHGGGHIIJJIIIIJJJJHHCHHIEHGIFHIJJIJJJJJFIIFHIJJJIIGIIGIJHIJIJJJJIIIJHHHHHHHFFFFFBCCDEEDDCDCC?A   AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:0T100  YT:Z:UP
HWI-ST1437:123:C69MTACXX:3:1101:1405:2072   133 chr1    85460210    0   *   =   85460210    0   TGAATGATGATAAAACCAAAGAGGCCTATAGATGATATGGAGAAAGTTTTTGTGGTCTGGATAGAAGATCAAACCAGCACCAACATTCACTAAAGCTAAAG   @@@DDEFFHGHHFIJJJJIJJIJJJIJIIHIHIJJJJIIHIJIJIJJJIJFHHIJIJJJJGHIIIGIGGIJJHHHHFFFFCE9ABDDDEDDDDCCCCDDDD   YT:Z:UP
HWI-ST1437:123:C69MTACXX:3:1101:1140:2073   99  chr20   43940927    40  101M    =   43941039    215 NTGCGGCTGGTGCTGCGCGGGGGCCGGGAGCTGGGTACCTTCCACAGCCGCCTTATCAAGGTCATCTCGAAGCCCTCGCAGAAGAAGCAGTCGCTGAAAAN   #1:BD7A@<?A;DFF<ECEEF>>?B6='65:<38?338A8ABBBBB<@?0<<55A(:(>+++3>:A>@2><89?BBBB@B5@?>?>33994>5>B<9AAA#   AS:i:-2 XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:0C99A0 YS:i:-29    YT:Z:CP

however when I convert it to a bam file using the '-h' argument the header is not retained:

samtools view -h -S -b input1/input1.sam > test.bam


    samtools view test.bam | head 
HWI-ST1437:123:C69MTACXX:3:1101:1405:2072   73  chr1    85460210    42  101M    =   85460210    0   NGTTATAGAGATGGCTTCCTTTCTTAAAGCTCATGAACCAACCTCTGCTAGCTTGAAACTTTTCTTCTGCAGCTTCATTACCTCTCTCAGCCTTCACAGAA   #1=DFDFFHGGGHIIJJIIIIJJJJHHCHHIEHGIFHIJJIJJJJJFIIFHIJJJIIGIIGIJHIJIJJJJIIIJHHHHHHHFFFFFBCCDEEDDCDCC?A   AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:0T100  YT:Z:UP
HWI-ST1437:123:C69MTACXX:3:1101:1405:2072   133 chr1    85460210    0*  =   85460210    0   TGAATGATGATAAAACCAAAGAGGCCTATAGATGATATGGAGAAAGTTTTTGTGGTCTGGATAGAAGATCAAACCAGCACCAACATTCACTAAAGCTAAAG   @@@DDEFFHGHHFIJJJJIJJIJJJIJIIHIHIJJJJIIHIJIJIJJJIJFHHIJIJJJJGHIIIGIGGIJJHHHHFFFFCE9ABDDDEDDDDCCCCDDDD   YT:Z:UP    
.....

Is there something that I'm missing???...

NOTE: after converted each to a bam file I will merge them all into one:

for i in input*
do
    samtools view -h -S -b ${i}/*.sam > ${i}/${i}.bam
done
samtools merge input_formatted.bam -@ 6 input*/*sorted.bam
samtools sort -o input_formatted_sorted.bam -@ 6 input_formatted.bam
sequence • 349 views
ADD COMMENTlink modified 3.2 years ago by Pierre Lindenbaum132k • written 3.2 years ago by chrisclarkson10090
2
gravatar for Pierre Lindenbaum
3.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum132k wrote:

Is there something that I'm missing???...

by default, the sam header is not printed in sam view. Try

samtools view -h  input1/input1.sam
ADD COMMENTlink written 3.2 years ago by Pierre Lindenbaum132k

or -H for header only.

ADD REPLYlink written 3.2 years ago by ATpoint42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1060 users visited in the last hour