Question: Using 2 different references in my pipeline
gravatar for michal.devir
3.0 years ago by
michal.devir0 wrote:


I am running some analysis on bam files that I have downloaded from ICGC. I don't have the original fastaq files, just the bam. The bams was aligned using a different reference than the one I use in my pipeline - I use hg19 from UCSC, I am not sure which reference was used for the bam but I think this is the Ensmble reference. The result is different naming convention ('1' vs 'chr1', 'GL000241.1' vs 'chrUn_gl000241'), and different order of contigs. This causes problems, for example when working with GATK. What is the right way to handle such a situation?

Thanks, Michal.

next-gen icgc genome • 818 views
ADD COMMENTlink modified 3.0 years ago by igor8.6k • written 3.0 years ago by michal.devir0

Download the files your pipeline needs for that other reference, or extract read information from the BAM files and map them against your reference. I don't think there is an easy way out here.

ADD REPLYlink written 3.0 years ago by Zaag720

Thanks a lot. I hoped that there is a simple way to do that, but I guess I'll have to work hard for that... :-/

ADD REPLYlink written 3.0 years ago by michal.devir0

It is additional work but not necessarily hard :)

Hopefully your bam has both mapped and unmapped reads. Otherwise you are missing a part of the original data.

ADD REPLYlink written 3.0 years ago by genomax73k
gravatar for igor
3.0 years ago by
United States
igor8.6k wrote:

Try CrossMap to convert between different references:

You may still run into problems with GATK because it will not only need the same contig names, but also same contig order, so you may need to also sort all the files again. You could also run into issues with alternate contigs if some of the files have them and others do not.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by igor8.6k
gravatar for WouterDeCoster
3.0 years ago by
WouterDeCoster41k wrote:

Two solutions: if you really want to use the hg19 reference and not the ensembl reference, convert the bam to fastq and perform the alignment against your preferred reference. Alternatively, change your annotation to the ensembl annotation. It's the best to use matching reference and annotation, you can probably try some nasty hacks such as changing the chromosome identifiers, but that will not make you happy in the long run.

ADD COMMENTlink written 3.0 years ago by WouterDeCoster41k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 720 users visited in the last hour