lift coordinates mapping to_alt chromosomes in hg38
6.1 years ago
bioguy24 ▴ 220

I am using liftover to convert ~100,000 hg19 coordinates to hg38. I know that there are duplicates in the hg19 bed file, but not sure whats going on or whats best to do. The hg38 coordinates are very different. Maybe table browser is a better option? Thank you :).

hg19

 chr19  54801916    54802239    chr19:54801916-54802239 .   LILRA3;LILRA6
chr19  54801917    54802239    chr19:54801917-54802239 .   LILRA3
chr19  54802472    54802789    chr19:54802472-54802789 .   LILRA3;LILRA6
chr19  54802473    54802789    chr19:54802473-54802789 .   LILRA3
chr19  54803901    54804020    chr19:54803901-54804020 .   LILRA3


hg38

 chr19_KI270938v1_alt:273030-273353
chr19_KI270938v1_alt:273031-273353
chr19_KI270938v1_alt:273586-273903
chr19_KI270938v1_alt:273587-273903
chr19_KI270938v1_alt:275015-275134

6.1 years ago
apa@stowers ▴ 580

Strangely, LILRA3 is not annotated to the reference chr19 in hg38, only to that alternate assembly. It's immediate neighbors, LILRB2 and LILRA5, are on the hg38 reference chr19, but there are no annotated genes between them. That ~1MB window containing LILRA3 did assemble, but did not get placed contiguously with the others, so got spun off as sequence KI270938.

So this is a mis-assembly (all genomes have them) but whether LILRA3 really belongs where it was on hg19, or some place new, is not clear.

This is a good post just for perspective: hg19 vs hg38 in two pictures

Thank you for the information, may I ask how you were able to determine that LILRA3 is not annotated in hg38 but LILRB2 and LILRA5 are? I guess I am trying to figure out tools that may help. Thank you very much :).

Go look at LILRA3 in hg19, see what its neighbors' names are. Then look those up in hg38, you will find they are still together but LILRA3 has disappeared.

Interesting, so what is best or the correct thing to do in a case like this? I guess to try and figure out why it mis mapped or potentially why may help. Thank you :).

Well what are you doing with the remapped coordinates? Why do you need them?

The reference on our sequencer is hg38 so I am lifting over the hg19 targets to hg38 as well. Basically after the sequence aligns the target bed file is used for variant calling, coverage, etc... Thank you :).

Most accurate is probably to repeat the alignment, if you have access to the original data.

Absolutely, repeat the alignment. And don't reinvent the wheel with gene coordinates; every genome version has its own associated gene annotations somewhere. See the knownGene files in UCSC hg38 annotation database or any Ensembl GTF for human which is Ensembl 76 or later.