Question: Reformat sam header issue
19 months ago by
pg_canada0 wrote:

Hi everyone,

I'm new to this community and new to this type of analysis, so I apologize if this question seems simple.

I'm formatting some bam files in order to run them through EXCAVATOR for CNV analysis. There are a couple of bam file where I get the following error:

[E::sam_parse1] missing SAM header
[W::sam_read1] parse error at line 7
[main_samview] truncated file.

When I check the bam (samtools view -h my bam) the file does seem to have a header, as below..

@HD     VN:1.4  GO:none SO:coordinate
@SQ     SN:chr1 LN:249250621    M5:1b22b98cdeb4a9304cb5d48026a85128     UR:/mnt/
@SQ     SN:chr2 LN:243199373    M5:a0d9851da00400dec1098a9255ac712e     UR:/mnt/
@SQ     SN:chr3 LN:198022430    M5:641e4338fa8d52a5b781bd2a2c08d3c3     UR:/mnt/
@SQ     SN:chr4 LN:191154276    M5:23dccd106897542ad87d2765d28a19a1     UR:/mnt/
@SQ     SN:chr5 LN:180915260    M5:0740173db9ffd264d728f32784845cd7     UR:/mnt/
@SQ     SN:chr6 LN:171115067    M5:1d3a93a248d92a729ee764823acbbc6b     UR:/mnt/
@SQ     SN:chr7 LN:159138663    M5:618366e953d6aaad97dbe4777c29375e     UR:/mnt/
@SQ     SN:chrX LN:155270560    M5:7e0e2e580297b7764e31dbc80c2540dd     UR:/mnt/
@SQ     SN:chr8 LN:146364022    M5:96f514a9929e410c6651697bded59aec     UR:/mnt/
@SQ     SN:chr9 LN:141213431    M5:3e273117f15e0a400f01055d9f393768     UR:/mnt/
@SQ     SN:chr10        LN:135534747    M5:988c28e000e84c26d552359af1ea2e1d     
@SQ     SN:chr11        LN:135006516    M5:98c59049a2df285c76ffb1c6db8f8b96     
@SQ     SN:chr12        LN:133851895    M5:51851ac0e1a115847ad36449b0015864     
@SQ     SN:chr13        LN:115169878    M5:283f8d7892baa81b510a015719ca7b0b     
@SQ     SN:chr14        LN:107349540    M5:98f3cae32b2a2e9524bc19813927542e

etc ..

Has anyone encountered this before? Any pointers as to how I can fix this?

This is the command I'm using to reformat:

samtools view -h mybam.bam | awk 'BEGIN{FS=OFS="\t"} (/^@/ && !/@SQ/){print $0} $2~/^SN:[1-9]|^SN:X|^SN:Y|^SN:MT/{print $0}  $3~/^[1-9]|X|Y|MT/{$3="chr"$3; print $0} ' | sed 's/SN:/SN:chr/g' | sed 's/chrMT/chrM/g' | samtools view -bS -> mybam_merge_reformat.bam

Thank you

what is the output of your pipeline BEFORE the last samtools view

PS: this awk might not work. You're going to add some chr to the unmapped reads, you're ignoring the mate and the 'SA' tag for supplementary alignments.

Pierre Lindenbaum129k
