Hello everyone
I have a vcf file that I'm trying to convert from hg19 to hg38. For that I'm using bcftools +liftover
command from here. I previously tried to use picard VCF but the memory cost was too much from the amount of data.
Whatever I do I end up running into a different error, if I try to run with hg19ToHg38.over.chain as the chain I get the error "The INFO tag "AF" is not the correct AGR tag".
If I try to use GRCh37_to_GRCh38.chain instead I get the error "Could not parse integer 167417 50000 80249 in the chain file: convert19-38/GRCh37_to_GRCh38.chain" instead.
I already tried using different references for both hg19 and hg38. for hg19 I used both goldenpath's hg19.fa and human_g1k_v37.fasta. for hg38 I tried using hg38.fa, GRCh38_full_analysis_set_plus_decoy_hla.fa, Homo_sapiens.GRCh38.dna.primary_assembly.fa, hg38Patch11.fa and GCA_000001405.15_GRCh38_no_alt_analysis_set.fna with both chain files
Here is the command I'm trying to run:
bcftools +liftover -Oz -o 510k_hg38.vcf output_510k/concat.dose.vcf.gz -- -s ref37/human_g1k_v37.fasta -f ref38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna -c convert19-38/hg19ToHg38.over.chain
As mentioned, I already tried different combinations of chain files, ref37 and ref38. The vcf file is already inputed to hrc 1.1 with michigan inputation server if that makes a difference.
The first error is probably related to VCF versioning. What version is your VCF file?
it says fileformat=VCFv4.1 on the vcf file
Weird. That error doesn't seem like a known thing either. Can you try a method from this thread: Lift-over on a VCF
The error looks like the result from a length check (https://github.com/freeseek/score/blob/1fcbc3e0d6dd600d871ebe1a1ffdf34672ad1e1d/liftover.c#L460), can you figure out which record gives you that error? The full error message should say something more specific about the record.
Sorry, I'm not sure how to find more about the record. that is the whole error message, it always happens on that same postion "167417 50000 80249" no matter what references I use, I will try the other methods this weekend and tell how it goes, thanks for the suggestion!
Have you tried going back to first principles, and creating a bed file from your VCF using the ID field (or
${chr}_${start}_${stop}_${ref}_${alt}
) as names; lifting the bed over using ucsc liftOver; and then re-mapping the positions? It's slightly more involved will not break in the same way.I haven't, I will try that and post the results. Probably should've asked for help earlier because I won't be able to finish it today, but thanks for the suggestion!