Recently I'm using UCSC's liftOver command line tool to map genomic coordinates between human and mouse. Suppose this is my BED file with one transcript composed of two exons (tab delimited):
chr1 16857 17751 HSALNT0000005 100 - 16857 17751 0,0,0 2 198,519, 0,375, geneID=HSALNG0000003
After converting with command: liftOver -bedPlus=12 -tab -minMatch=0.8 -minBlocks=0.5 -fudgeThick -multiple input.txt hg38ToMm39.over.chain out.txt unmapped.txt
The out.txt is chr6 121498510 121498660 HSALNT0000005 100 + 121497851 121498660 0 1 150, 0, geneID=HSALNG0000003
with only one of two exon ranges lifted. The question is how can I konw which exon is lifted? Since the unlifted one is directly dropped from any of the output files. To conquer this, I tried splitting my input into two single lines with each exon a line:
chr1 16857 17055 HSALNT0000005_1 100 - 16857 17055 0,0,0 1 198, 0, geneID=HSALNG0000003
chr1 17232 17751 HSALNT0000005_2 100 - 17232 17751 0,0,0 1 519, 0, geneID=HSALNG0000003
Surprisingly, none of the ranges are lifted this time. The unlimited file:
#Partially deleted in new
chr1 16857 17055 HSALNT0000005_1 100 - 16857 17055 0 1 198, 0, geneID=HSALNG0000003
#Boundary problem: need 1, got 0, diff 1, mapped 0.0
chr1 17232 17751 HSALNT0000005_2 100 - 17232 17751 0 1 519, 0, geneID=HSALNG0000003
If one of the exons can be lifted in the original file, why can't it seperately? Or is there another way to know witch exons is lifted from a BED 12+ file by liftOver? Thanks.