Question: Ucsc/Ensembl Bam Converter?
6
gravatar for 2184687-1231-83-
7.7 years ago by
2184687-1231-83-4.9k wrote:

Is there any tool to convert BAM files that have been produced against Ensembl genome references to UCSC-compatible BAM files? And viceversa? AFAICS, Ensembl sequence names don't have the 'chr' in front of the sequence name, but UCSC does, so when I try to attach my BAM file to UCSC, I get this error:

Error Unrecognized format line 1 of ftp://somewhere.com/pub/myfile.Mus_musculus.NCBIM37.61.bam: ‹ (note: chrom names are case sensitive)
ensembl bam reference ucsc • 5.6k views
ADD COMMENTlink modified 22 months ago by balwierz40 • written 7.7 years ago by 2184687-1231-83-4.9k
13
gravatar for Pablo Marin-Garcia
7.7 years ago by
Spain
Pablo Marin-Garcia1.8k wrote:

In order to do it right, you need to take care of several things. If you have only chromosomal reads then it is easy:

You need first to reformat the header

samtools view -H file.bam | perl -lpe 's/SN:([0-9]+|[XY]|MT)\b/SN:chr$1/' > file.sam

Then you need to fix the alignments to point to the same ref name that the header. You need to change columns 3 and 7 (paired ends), but remember not to change the ones with '*' or '='

samtools view file.bam | \
  perl -lane '$F[2]=q{chr}.$F[2] unless ($F[2] eq q{*}); \
              if($F[6] ne q{*} && $F[6] ne q{=}){ \
                  $F[6]=q{chr}.$F[6] \
              }; \
              print join("\t", @F)' \
  >> file.sam

For very large files probably you would like to go for sed or perl with regular expressions for doing it faster.

If you have non chromosomal reads then you need more low level QC in order to know that you are doing the things right:

Finally for paranoiacs:

  • Do the header SQ have the column M5? if it does I will double check that your reference seq md5 corresponds to the one you want to use in UCSC (just in case)

[edited]

  • and look at the Bio_X2Y answer for problems with Mitocondrial genome.

I would assume that you are doing this for visualising your reads in UCSC browser or ensembl browser, so you are only looking to chromosomal mappings. Isn't it?. So the first thing I would do is to extract all the chromosomal reads into a file and patch them. That way you avoid all the complex logic that would force you to realign your data with the other reference.

ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Pablo Marin-Garcia1.8k

I've tried manually adding the '@HD' line at the top, but UCSC doesn't seem to like it. Any ideas? perl -e 'print "@HDtVN:1.0tGO:nonetSO:unknownn"' > $file.ucsc.sam

ADD REPLYlink written 7.7 years ago by 2184687-1231-83-4.9k

seems that UCSC is not recognising the bam created, have you tried to create a sam with no header and then use the ucsc indexed reference to create the bam like samtools -bt ucsc_ref.fai your_sam > your_bam?

ADD REPLYlink written 7.7 years ago by Pablo Marin-Garcia1.8k
6
gravatar for Bio_X2Y
7.7 years ago by
Bio_X2Y3.6k
Ireland
Bio_X2Y3.6k wrote:

This feels like a risky thing to do. As Pablo points out, there are subtle differences between the two references (e.g. names of the unplaced and unlocalised contigs, masked PAR regions, etc.). Depending on your goals, these may or may not be important.

Perhaps the trickiest difference is the mitochondrial sequences - Ensembl's "MT" sequence is based on the revised Cambridge Sequence (genbank NC_012920), while UCSC's "chrM" uses a different sequence (genbank NC_001807). AFAIK, there is no trivial way to convert alignments between the two of these.

If this concerns you, I suggest you create your new BAM by realigning your reads against the UCSC reference rather than retrofitting the Ensembl BAM.

ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Bio_X2Y3.6k
4
gravatar for balwierz
22 months ago by
balwierz40
balwierz40 wrote:

Note: I am writing this answer 6 years later. Now it is easy with 'samtools reheader' First generate the header:

samtools view -H file.bam | perl -lpe 's/SN:([0-9]+|[XY]|MT)\b/SN:chr$1/' >header.ucsc.sam

And edit header.ucsc.sam to correct manually chrMT to chrM (sorry for no perl oneliner that does that!)

Then use reheader function

samtools reheader header.ucsc.sam file.bam > file.ucsc.bam
ADD COMMENTlink written 22 months ago by balwierz40
1

it would be great if there was a toolkit for ID conversions like this

ADD REPLYlink written 21 months ago by cmdcolin1.1k
2
gravatar for Pierre Lindenbaum
7.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

Transform your bam to sam and use 'sed'.

samtools view mouse.bam | sed 's/\t/\tchr/2' > mouse.sam
samtools view -b -t mouse.fa -S -o mouse2.bam mouse.sam

I don't have any bam in front of me at the time of this writing. you might also have to change the @Header and to check what happens for the unmapped reads.

ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Pierre Lindenbaum118k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 785 users visited in the last hour