PopGenome: there are missing regions when calculating Tajima's D per gene
Entering edit mode
10 days ago
Bing • 0

Hello all,

I am new to PopGenome and would like to ask one question that greatly confused me.

I was trying to calculate Tajima's D by gene for my whole genome data. I imported the gff files and subseted the data by "gene". See my codes below. If I use the whole gff file, when I set tid="1", it reads not only chromosome 1, but also chr11 and chr12. Therefore, I subset chr1.gff. However, when I checked region names, there are some genes missing.

Has anyone encountered with this problem before? How do you solved this?

My codes:

GENOME.class <- readVCF('indica.vcf.gz',numcols = 70000,tid="1",from=1,to=45000000,gffpath = "chr1.gff")
GENOME.class <- set.populations(GENOME.class,list(c("C019","C135","C139","C151","ZS97"),c("C148","W161","W169","MH63")),diploid = TRUE)
# Splitting data into genes subsites 
GENOME.class.slide <- splitting.data (GENOME.class,subsites="gene")

The number of genes on chr 1 should be 5,271:

enter image description here

However, there were only 2,189 whe I checked.

enter image description here

PopGenome • 174 views

Login before adding your answer.

Traffic: 3618 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6