Using 2 different references in my pipeline
2
0
Entering edit mode
5.0 years ago

Hi,

I am running some analysis on bam files that I have downloaded from ICGC. I don't have the original fastaq files, just the bam. The bams was aligned using a different reference than the one I use in my pipeline - I use hg19 from UCSC, I am not sure which reference was used for the bam but I think this is the Ensmble reference. The result is different naming convention ('1' vs 'chr1', 'GL000241.1' vs 'chrUn_gl000241'), and different order of contigs. This causes problems, for example when working with GATK. What is the right way to handle such a situation?

Thanks, Michal.

genome next-gen ICGC • 1.1k views
ADD COMMENT
1
Entering edit mode

Download the files your pipeline needs for that other reference, or extract read information from the BAM files and map them against your reference. I don't think there is an easy way out here.

ADD REPLY
0
Entering edit mode

Thanks a lot. I hoped that there is a simple way to do that, but I guess I'll have to work hard for that... :-/

ADD REPLY
1
Entering edit mode

It is additional work but not necessarily hard :)

Hopefully your bam has both mapped and unmapped reads. Otherwise you are missing a part of the original data.

ADD REPLY
2
Entering edit mode
5.0 years ago
igor 12k

Try CrossMap to convert between different references: http://crossmap.sourceforge.net/

You may still run into problems with GATK because it will not only need the same contig names, but also same contig order, so you may need to also sort all the files again. You could also run into issues with alternate contigs if some of the files have them and others do not.

ADD COMMENT
1
Entering edit mode
5.0 years ago

Two solutions: if you really want to use the hg19 reference and not the ensembl reference, convert the bam to fastq and perform the alignment against your preferred reference. Alternatively, change your annotation to the ensembl annotation. It's the best to use matching reference and annotation, you can probably try some nasty hacks such as changing the chromosome identifiers, but that will not make you happy in the long run.

ADD COMMENT

Login before adding your answer.

Traffic: 1992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6