Question: Reheadering a subsetted bam gives truncated file
0
gravatar for Luciferine
15 months ago by
Luciferine0
Luciferine0 wrote:

Hi all, I have a problem with reheadering a bam file which I subsetted before. I mapped against a merged genome of human and mouse (since it is a patient derived xenograft sample) and want to remove residual mouse reads. I did this with samtools (version 1.8) by simply subsetting for the human chromosomes.

samtools view -b mybam.bam chrM chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY > no-mouse.bam

I want to remove the mouse chromosomes from the header by removing the respective lines from the header with sed (there are also some ERCCs etc. in the reference, which is why I have to remove so many lines) and reheadering the bam afterwards.

samtools view -H  no-mouse.bam | sed '27,253d' | samtools reheader - no-mouse.bam > no-mouse.rehead.bam

I end up with a truncated bam file, if I want to look at e.g. the end of the file with

samtools view no-mouse.rehead.bam | tail

I get [main_samview] truncated file.Following steps (e.g. AddOrReplaceReadGroups) give errors as well.

I've seen some posts on similar topics but din't find any helpful solution unfortunately.

Can somone help me with that? Thanks a lot!

bam samtools header reheader • 514 views
ADD COMMENTlink modified 12 months ago by Biostar ♦♦ 20 • written 15 months ago by Luciferine0

I mapped against a merged genome of human and mouse

Do you have a reference that this is good practice and reliable? I could imagine that because of homologies between the species, you'll get plenty of multimappers.

ADD REPLYlink written 15 months ago by ATpoint24k
3

You can probably find a reference in the allele-specific literature, though we also use this strategy in the WGBS world.

In general, if you know your sample is a composite of a couple organisms then concatenating them minimizes the bias.

ADD REPLYlink written 15 months ago by Devon Ryan92k

Ok, never heard of this strategy. Thanks!

ADD REPLYlink written 15 months ago by ATpoint24k

Here some other things that didn't do the job either:

  • Converting to sam, changing header, reconverting to bam
  • Sorting the input bam again after subsetting (samtools sort)
  • ReorderSam to the human-only reference, this only works if chromosomes in header are the same as in reference.

I get the following error when I start any Picard tools (e.g. ReorderSam, AddOrReplaceReadGroups), maybe this might help to find the problem:

Exception in thread "main" java.lang.IllegalArgumentException: Reference name for '195' not found in sequence dictionary.

The numbers vary in different files, I also got '185' and others.

Hope somebody can help me! Thanks in advance!

ADD REPLYlink modified 15 months ago • written 15 months ago by Luciferine0

Are you sure that your header order of chromosomes really is the same order as no-mouse.bam?

ADD REPLYlink written 15 months ago by swbarnes26.7k

That's a good hint, thanks! If I look at the bam file the first reads I get are from chr1, whereas the header starts with chrM. But I don't really know how to fix this. I'll try ReorderSam on the subsetted file with the old header, maybe that'll work (reordering the subsetted file with the new header doesn't).

ADD REPLYlink written 15 months ago by Luciferine0
1

See my cat solution in the comment below, it should work without needing to use ReorderSam.

ADD REPLYlink written 15 months ago by Devon Ryan92k
1
gravatar for Devon Ryan
15 months ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

The thing with reheadering files is that you have to be absolutely sure that you're only removing chromosomes from the end of the header and never from the beginning, otherwise you'll end up borking the file. My guess is that your sed command is wrong, so write the header to a file and manually look at it.

ADD COMMENTlink written 15 months ago by Devon Ryan92k

I tried saving the header separately, removing the lines I don't want (which are at the end of the header) and reheadering the file. It still gives me a truncated file.

ADD REPLYlink written 15 months ago by Luciferine0

Is the original file truncated before reheadering?

ADD REPLYlink written 15 months ago by Devon Ryan92k

Nope, the original file is perfectly okay...

ADD REPLYlink written 15 months ago by Luciferine0
1

What happens if you cat fixed_header <(samtools view no-mouse.bam) | samtools view -bo no-mouse.rehead.bam -? That's a less efficient reheadering, but avoids issues due to swapping chromosome order.

ADD REPLYlink written 15 months ago by Devon Ryan92k
0
gravatar for Luciferine
14 months ago by
Luciferine0
Luciferine0 wrote:

Thank you Devon, that indeed worked to find out the actual problem, which seems to be, that I have chimeric reads, where one read of a pair maps to a human chromosome and the other one to a mouse chromosome. These read pairs of course lack one read after filtering out all non-human mapped reads, leaving a human read with a reference to a non-existent mouse read. This causes problems in the following steps. I'm trying to get rid of those now.

ADD COMMENTlink written 14 months ago by Luciferine0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2125 users visited in the last hour