liftover using genome browser
1
0
Entering edit mode
5 months ago
priyanka ▴ 20

Hello everyone,

I have a file which is hg38 build. I want to do a liftover and change it to hg19. I thought of using liftover tool from UCSC genome browser. I realise that the input file should be bed format.

My file has only two part: chrom and position. This is how my file look:

CHROM_POS
chr10_100009635
chr10_100187980
chr10_100229692
chr10_100267650


Or more detail file is:

GENE RSID1 RSID2 VALUE
ENSG00000000457.13 chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 0.1736259917762202
ENSG00000000457.13 chr1_169894240_G_T_b38 chr1_169891332_G_A_b38 0.09154263431207886
ENSG00000000457.13 chr1_169891332_G_A_b38 chr1_169891332_G_A_b38 0.5075352470673014


Can anyone please tell me how should I convert this format to bed format or maybe I can use some other tool for liftover.

vcf liftover bed • 1.4k views
0
Entering edit mode

Use a proper title, not a list of comma-separated terms. Read: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202

0
Entering edit mode

I tried using genome browser but I don;t know how to convert this file format to bed file format.

0
Entering edit mode

You have the content necessary for the bed file. Split each line of the first CHROM_POS file by _ and repeat the second element twice to get to the basic bed format.

0
Entering edit mode

Okay, I have one more doubt. In some case there is same position like start and end eg: chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 So is this right? To have same start and end position?

0
Entering edit mode

I don't understand your question. Do you mean you have duplicate entries? Did you try extracting the fields you need and actually running them through liftover?

0
Entering edit mode

No, I don't have duplicate entries. I know that bed file should be chr, start and end. In my file, its gene followed by rsid which is in form of chrom_pos. So if you look for one gene there is two same rsid.

0
Entering edit mode

Did you try extracting the fields you need and actually running them through liftover?

0
Entering edit mode

Yes, I did. It says incorrect format. But I am still confuse as to what should be the stop position

0
Entering edit mode

Please read the comment chain - I've mentioned how to get the end position (when the start and end are the same)

0
Entering edit mode

That mean it should be chr1:69894240-169894240. Am i right?

0
Entering edit mode
ENSG00000000457.13 chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 0.1736259917762202
ENSG00000000457.13 chr1_169894240_G_T_b38 chr1_169891332_G_A_b38 0.09154263431207886
ENSG00000000457.13 chr1_169891332_G_A_b38 chr1_169891332_G_A_b38 0.5075352470673014
ENSG00000000460.16 chr1_169661963_G_A_b38 chr1_169661963_G_A_b38 0.2107198702727749
ENSG00000000460.16 chr1_169661963_G_A_b38 chr1_169697456_A_T_b38 -0.03676569950387048
ENSG00000000460.16 chr1_169697456_A_T_b38 chr1_169697456_A_T_b38 0.3974601519919186
ENSG00000000938.12 chr1_27636786_T_C_b38 chr1_27636786_T_C_b38 0.050964267099090806
ENSG00000000971.15 chr1_196651787_C_T_b38 chr1_196651787_C_T_b38 0.4262626847615553
ENSG00000001036.13 chr6_143501715_T_C_b38 chr6_143501715_T_C_b38 0.4365424090912025
ENSG00000001036.13 chr6_143501715_T_C_b38 chr6_143511989_A_G_b38 0.38588058145595594


This is the file content. I have one doubt. If i repeat second one as stop position then I will only have similar ones as start and end

0
Entering edit mode
chr1 169894240 169894240
chr1 169894240 169891332
chr1 169891332 169891332
chr1 169661963 169661963
chr1 169661963 169697456 ## also in this case it showing error since start is coming as big than stop.


If i followed the above steps, i will mostly get only same start and stop

0
Entering edit mode

Please explain your problem better. What do the four columns mean in your source file, and what are you trying to accomplish using the liftover?

0
Entering edit mode

I have two files. one is vcf and other is this model. I want to check number of SNP overlap between these two. But their genome coordinates is different. one is hg19 and other is hg38. So i am trying to do liftover and then find overlap snp.

0
Entering edit mode

I was able to convert my vcf files to bed files. But then when I submit it to genome browser it says : Successfully converted 147944 records: Conversion failed on 209 records It was not able to convert for 209 records.

0
Entering edit mode

I've had that happen. Not all co-ordinates can be successfully lifted over, I think.

1
Entering edit mode

Right, this happens if there are "gaps" in the chain file. These gaps can happen for many reasons - for example an insertion variant that exists in a portion of the population - may be included in one reference genome (in which case the alt allele will be a deletion) and not in the other reference genome (in which case the alt allele will be the insertion). The chain file from the first to second reference will have a gap because there is no mapping for the bases of this insertion.

0
Entering edit mode

Thank you for giving an explanation

0
Entering edit mode
5 months ago
Divon ▴ 170

Another option is to do it the other way around:

Turn your VCF into a dual-coordinate VCF (i.e. a VCF containing coordinates in both hg19 and hg38 concurrently):

genozip --chain hg19tohg38.chain.genozip myfile.vcf

To view your data in hg19:

genocat myfile.vcf.genozip

To view your data in hg38 (and compare to the other file):

genocat --luft myfile.vcf.genozip

More details: https://genozip.com/dvcf.html

1
Entering edit mode

Ohh that great. I will try and do this