Question: Duplicate SAMSequenceDictionary in SAM file - how to modify SAM header?
1
gravatar for kwalter
16 months ago by
kwalter10
kwalter10 wrote:

Hi,

I have used bwa mem to map paired end fastq files to a ref genome and the resulting SAM file has duplicated @SQ lines in the SequenceDictionary:

bwa mem -t -M 4 ${REF_DIR}${ref} ${DATA_DIR}${p1} ${DATA_DIR}${p2} > ${sam}

The resulting SAM file generates errors using Picard Tools:

Exception in thread "main" java.lang.IllegalArgumentExceptioannot add sequence that already exists in SAMSequenceDictionary: chr1n: C

As well as converting the SAM to BAM.

[W::sam_hdr_parse] Duplicated sequence 'JTFH01001905.1'
[W::sam_hdr_parse] Duplicated sequence 'JTFH01001906.1'
[W::sam_read1] Parse error at line 29991
[main_samview] truncated file.

How can I replace the SAM header to remove duplicated @SQ lines?

I see there is an old post here: Duplicate @Sq Lines In Sam File Header that suggests the issue can be resolved usign the bwa mem -M argument. I am wondering if there is way to modify the SAM without remapping, as this is a large SAM file.

Thank you!

ADD COMMENTlink modified 16 months ago by finswimmer12k • written 16 months ago by kwalter10

samtools quickcheck will check for a valid header in your SAM. Also maybe samtools reheader could be useful? Here's the docs

ADD REPLYlink written 16 months ago by goodez460

Thanks, yes, the SAM passes samtools quickcheck and unfortunately, samtools reheader only takes input BAM files.

ADD REPLYlink written 16 months ago by kwalter10
0
gravatar for swbarnes2
16 months ago by
swbarnes26.9k
United States
swbarnes26.9k wrote:

The clunky way to do it:

samtools view -H input.bam > header.sam

fix the header; it's plain text, then

samtools view input.bam | cat header.sam - | samtools view -Sb - > new.bam
ADD COMMENTlink written 16 months ago by swbarnes26.9k

Nice I like it. Apparently the -S option is only needed for older versions of samtools though.

ADD REPLYlink modified 16 months ago • written 16 months ago by goodez460

Thank you both! I was also running into issues with samtools view -b $sam > test.bam [W::sam_read1] Parse error at line 29991 [main_samview] truncated file.

I looked at the parse error lines and the @SQ header is duplicated within my SAM file. Is this a known error with bwa mem? Can I remove all these @SQ lines with awk and then use the SAM file?

ADD REPLYlink written 16 months ago by kwalter10

Sounds like it's not just an issue with the header. When you run samtools view -b it ignores the header. The truncated file error you showed means there's actually something wrong with a mapped read entry. I think you should just remap the sequences, maybe try bowtie2 or STAR as alternatives to bwa.

Also it's possible your reference genome file is what caused the

Exception in thread "main" java.lang.IllegalArgumentExceptioannot add sequence that already exists in SAMSequenceDictionary: chr1n: C

ADD REPLYlink written 16 months ago by goodez460

Thank you - yes, remapping, I get the same issue with a duplicated SAM header. This time, I added the -M argument to bwa mem: bwa mem -t 16 -M ${REF_DIR}${ref} ${DATA_DIR}${p1} ${DATA_DIR}${p2} > ${sam}

Is this a known issue and is there something I can do to mitigate? Thank you!

@SQ SN:chr1 LN:248956422 @SQ SN:chr10 LN:133797422 @SQ SN:chr11 LN:135086622 @SQ SN:chr12 LN:133275309 @SQ SN:chr13 LN:114364328 @SQ SN:chr14 LN:107043718 @SQ SN:chr15 LN:101991189 @SQ SN:chr16 LN:90338345 @SQ SN:chr17 LN:83257441 @SQ SN:chr18 LN:80373285 @SQ SN:chr19 LN:58617616 @SQ SN:chr1_KI270706v1_random LN:175055 @SQ SN:chr1_KI270707v1_random LN:32032 @SQ SN:chr1_KI270708v1_random LN:127682 @SQ SN:chr1_KI270709v1_random LN:66860 @SQ SN:chr1_KI270710v1_random LN:40176 @SQ SN:chr1_KI270711v1_random LN:42210 @SQ SN:chr1_KI270712v1_random LN:176043 @SQ SN:chr1_KI270713v1_random LN:40745 @SQ SN:chr1_KI270714v1_random LN:41717 @SQ SN:chr11_KI270721v1_random LN:100316 @SQ SN:chr14_GL000009v2_random LN:201709 @SQ SN:chr14_GL000225v1_random LN:211173 @SQ SN:chr14_KI270722v1_random LN:194050 @SQ SN:chr14_GL000194v1_random LN:191469 @SQ SN:chr14_KI270723v1_random LN:38115 @SQ SN:chr14_KI270724v1_random LN:39555 @SQ SN:chr14_KI270725v1_random LN:172810 @SQ SN:chr14_KI270726v1_random LN:43739 @SQ SN:chr15_KI270727v1_random LN:448248 @SQ SN:chr16_KI270728v1_random LN:1872759 @SQ SN:chr17_GL000205v2_random LN:185591 @SQ SN:chr17_KI270729v1_random LN:280839 @SQ SN:chr17_KI270730v1_random LN:112551 @SQ SN:chr1 LN:248956422 @SQ SN:chr10 LN:133797422 @SQ SN:chr11 LN:135086622 @SQ SN:chr12 LN:133275309 @SQ SN:chr13 LN:114364328 @SQ SN:chr14 LN:107043718 @SQ SN:chr15 LN:101991189 @SQ SN:chr16 LN:90338345 @SQ SN:chr17 LN:83257441 @SQ SN:chr18 LN:80373285 @SQ SN:chr19 LN:58617616

ADD REPLYlink modified 16 months ago • written 16 months ago by kwalter10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1269 users visited in the last hour