SNPs used to create tree using RELATE
0
1
Entering edit mode
3.5 years ago
njandaro ▴ 10

Hi!

I am trying to estimate the genealogy trees using RELATE. I need these to be used later as inputs for PALM (Stern et al 2020). I am using 1000 genomes dataset focusing on EUR populations.

There are two steps of RELATE's algorithm that I do not understand.

  1. I noticed that at the data preparation stage 80% of all SNPs are removed whenever I try to subset a certain population. There is no specific message describing why these SNPs get thrown out. And I am unable to spot any specific pattern among kept or removed SNPs.

  2. It seems that what RELATE understands as an ancestral allele for a SNP is not the same as ancestral allele that I can read in the INFO column of the original VCF file. I am passing to RELATE the ancestral fasta files that I downloaded from 1000 Genomes. So, in principle, both ancestral informations are coming from the same source. But the allele coded with 0 in the RELATE output file does not correspond 1-1 with the ancestral allele I read from INFO column.

In the latter point, I am not sure if maybe I am doing something wrong. I simply downloaded the zip file with ancestral alleles, unzipped it and gave RELATE path to the .fa file for the corresponding chromosome.

I also can't understand fasta files. For example, chromosome 1 in the original vcf file has a little more than 6 million SNPs. Fasta file for chromosome 1 has more than 200 million characters. I have no idea how to relate 200 million characters to 6 million SNPs.

If someone could clarify any of the above confusions/questions, I'd be very grateful!

Best, Fatima

RELATE 1000 genomes SNP • 556 views
ADD COMMENT

Login before adding your answer.

Traffic: 2754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6