PopGenome software: concatenating chromosomes and coalescent simulations
3
1
Entering edit mode
9.4 years ago
cintapq ▴ 10

Dear all,

I have two questions about the PopGenome package for R.

  1. When using the readVCF function, it is possible to read several chromosomes at the same time? If not, it is possible to merge several GENOME.class ​objects? (my VCF files are "small", I don't have memory limitations).
  2. Coalescent simulations are computed correctly well when applied to a whole chromosome but fail when applied to a subset of sites: i.e. exons, introns, sliding window analysis... I copy the code for clarity:
  • For a certain chromosome (works well!):

    GENOME.classChr1 <- neutrality.stats(GENOME.classChr1)
    MS.classChr1 <- MS(GENOME.classChr1,thetaID="Tajima",neutrality=T)
    
  • For exons (fails!):

    GENOME.classChr1.splitEx <- neutrality.stats(GENOME.classChr1.splitEx) #this command works well
    MS.classChr1.splitEx <- MS(GENOME.classChr1.splitEx,thetaID="Tajima",neutrality=T) #ERROR: Error in 0:nsamtot : argument of length 0
    

How can I solve these issues?

VCF popGenome population-genomics R • 7.6k views
ADD COMMENT
0
Entering edit mode

Hi Christian,

Thanks for your suggestion about reading different chromosomes at the same time. In fact is a good idea and should work, the problem is that my VCF files are huge and R session is aborted when reading data in this way...

I've solved the issue of the coalescent simulations by running first the ms program to get the simulations (which allows you to modify default parameters) and read the output file in popgenome using the readMS() function. Once you have read the simulations file you can compute different statistics, for instance using neutrality.stats().

Hope it helps!

ADD REPLY
1
Entering edit mode

?

ADD REPLY
1
Entering edit mode
9.2 years ago

Hi @cintapq,

Did you ever find out how to read in several chromosomes at the same time? I struggled with this, but eventually figured out that this works if you have VCFs:

dir/
    gff/
        chr01.gff
        chr02.gff
    vcf/
        chr01.vcf
        chr02.vcf

Then you can open using data <- readData("dir/vcf/", format="VCF", gffpath = "dir/gff/")

Re the coalescent simulations, I'm running into the same problem as you (#ERROR: Error in 0:nsamtot : argument of length 0). Did you ever figure out a way around this? Perhaps doing simulations gene-by-gene would work.

ADD COMMENT
0
Entering edit mode

Hi, I am trying to load my data into PopGenome too, eg: data = readData("", format=VCF), then the software just shut down. I tried the same command with FASTA format, it works. So I compared my .VCF with the .VCF from the DATA, I realized that my .VCF need tabix indexed. I googled the tabix and download the tabix-0.2.6 software. Can anyone show me how to index my .vcf file? I have no clue about the software.

ADD REPLY
0
Entering edit mode

Hi Huan,

If you have tabix already installed you just need to type this command:

tabix -p vcf FILE.vcf.gz

you will obtain a tabix indexed file (.tbi)

ADD REPLY
0
Entering edit mode

Hi,

I tried another way to read in the SNP data.

readData("", format=HapMap)

And it's OK. Then I did diversity.stat follow the introduction. I want to compare the nucleotide diversity between 2 populations at all the SNP positions. Because I have missing genotypes, I wonder how the Popgenome calculate the diversity within the populations? The reference is Nei's book named Molecular Evolutionary Genetics. I couldn't find it.

The results come back only about 600 sites which has no SNP position information. In hapmap format file, the SNP physical position is included. I can see the difference between the two populations, but I couldn't figure out where the difference happening. I totally has about 6000SNPs. The results only give me 600 data. What I am expecting is the first column is SNP name, then is the nucleotide diversity in Pop1, and then nucleotide diversity in Pop2. It's should be the same for the Fst or haplotype diversity. Then it will be easier to find the interested SNPs.

By the way, I feel PopGenome is more suitable for the alignments data instead of SNPs data. Neutrality stats is only suit for alignments, not for SNPs, isn't it?

ADD REPLY
0
Entering edit mode

After loading in the data, I checked the summary of the data by get.sum.data(). Here is the summary. How did the popgenome calculate the sites? If there is no valid sites, does it mean I can't do further analysis, such as diversity and F_ST etc?

              n.sites n.biallelic.sites n.gaps n.unknowns n.valid.sites
82sam.hmp.txt 596001040               636      0       1128           NaN
              n.polyallelic.sites trans.transv.ratio
82sam.hmp.txt                   0           1.373134
ADD REPLY
0
Entering edit mode

Hi,

ok, seems you have SNP data in HapMap format and quite alot of unknown nucleotides like ./.

try

test <- readData("HapMapFolder", format="HapMap", include.unknown=TRUE)

Note, PopGenome will take the position of the last SNP specified in the file to define the slot n.sites

Best,
B

ADD REPLY
0
Entering edit mode

I'm trying to read a HapMap file with readData, butI get an error,

> GENOME.class<-readData("D:/R/a/b", format="HapMap")
|            :            |            :            | 100 %
|
Read 5764 items
Error in CCC$matrix : $ operator is invalid for atomic vectors

These are the first ten lines and 16 columns of the hapmap file

Can you help me? lipc20052006 [at] 163 [dot] com

rs#  alleles  chrom  pos    strand  assembly  center  protLSID  assayLSID  panelLSID  QCcode  L1  L2  L3  L4  L5
S1   A/G      1      65513  +       NA        NA      NA        NA         NA         NA      AA  AA  AA  GG  AA
S2   G/A      1      86025  +       NA        NA      NA        NA         NA         NA      AA  GG  GG  GG  GG
S3   A/G      1      98468  +       NA        NA      NA        NA         NA         NA      AA  AA  AA  AA  AA
S4   G/A      1      2E+05  +       NA        NA      NA        NA         NA         NA      GG  GG  GG  GG  GG
S5   G/A      1      2E+05  +       NA        NA      NA        NA         NA         NA      GG  GG  GG  GG  GG
S6   G/A      1      2E+05  +       NA        NA      NA        NA         NA         NA      GG  GG  GG  GG  GG
S7   G/A      1      2E+05  +       NA        NA      NA        NA         NA         NA      GG  AA  AA  GG  AA
S8   G/A      1      2E+05  +       NA        NA      NA        NA         NA         NA      GG  GG  GG  GG  GG
S9   G/A      1      2E+05  +       NA        NA      NA        NA         NA         NA      GG  AA  AA  GG  AA
S10  A/C      1      2E+05  +       NA        NA      NA        NA         NA         NA      AA  AA  AA  AA  AA
ADD REPLY
1
Entering edit mode
4.7 years ago
nataliagru1 ▴ 90

Dear All,

popgenome actually has an option to concatenate genome classes. Therefore you can put each chromosome into a class (genome_chr1.class, genome_chr2.class, genome_chr3.class, etc.) and merge all genome classes via concatenate.classes(genome_chr1.class, genome_chr2.class, genome_chr3.class).

Command would look like this:

> genome_allchroms.class <- concatenate.classes(list(genome_chr1.class, genome_chr2.class, genome_chr3.class))
ADD COMMENT
0
Entering edit mode
9.0 years ago

Hi ALL, see also http://popgenome.weebly.com/

THX, all the best.

Bastian

ADD COMMENT
1
Entering edit mode

Nobody never answers in the forum at http://popgenome.weebly.com/

ADD REPLY

Login before adding your answer.

Traffic: 2164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6