Question: Input format for SNP data in Arlequin
0
gravatar for akuepper
3.4 years ago by
akuepper0
akuepper0 wrote:

Hi, I have genotyping-by-sequencing (GBS) SNP data for 95 individuals and 8 populations. 4566 SNPs for each individual. I would like to do an AMOVA analysis in Arlequin and have trouble reading in the data into the program. I am running out of ideas where I could have gone wrong in my input file. Below are the first lines of my input file (an *.arp file). If anyone has ideas as to what I could try, help is very much appreciated! Thank you!

[Profile] Title="8 populations of Palmer amaranth" NbSamples=8 GenotypicData=1 # - {0, 1} GameticPhase=0 # - {0, 1} RecessiveData=0 # - {0, 1} DataType=DNA # - {DNA, RFLP, MICROSAT, STANDARD, FREQUENCY} LocusSeparator=TAB # - {TAB, WHITESPACE, NONE} MissingData='N' # A single character specifying missing data [Data] [[Samples]] SampleName="Arizona resistant" SampleSize= 12 SampleData= { AZR10 G C C T A A G G T T G A A C A T A G G R Y G T T T A T T C W A Y N C C C G T A Y T C T G T C A N G W A C A A G C N C T C G G C R A A T G N G G A A T T T A G C G Y C C G R T T A T C C C T C A T Y T C T T A G C A G C T C G A G A M A C A R C G K A W C C C C T G T G C A Y C A A C A R T G

snp • 3.9k views
ADD COMMENTlink modified 3.3 years ago by Fabio Marroni2.3k • written 3.4 years ago by akuepper0

1) Did you get an error message? Can you post it?

2) Did you try using a very small set of SNPs and of individuals? This could help you finding the problem

3) I am not sure if Arlequin is happy with IUPAC codes, did you check this?

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Fabio Marroni2.3k
0
gravatar for akuepper
3.3 years ago by
akuepper0
akuepper0 wrote:

Thank you very much for your reply.

1) #[ERROR # 1] : unable to read genotype frequency #[ERROR # 2] : unable to read sample data 2) I will try with a smaller SNP set, am wondering though if the size is the problem. I would hate to downsize on information. 3) In the Arlequin manual it says that "The following notation for ambiguous nucleotides are also recognized: R: A/G (purine) Y: C/T (pyrimidine) M: A/C W: A/T S: C/G K: G/T B: C/G/T D: A/G/T H: A/C/T V: A/C/G N: A/C/G/T" Which is why I thought I could use the data format I am currently using. But I have not found any example for an input format with similar data. I am afraid that any other format might make me lose information.

ADD COMMENTlink written 3.3 years ago by akuepper0
0
gravatar for akuepper
3.3 years ago by
akuepper0
akuepper0 wrote:

The smaller data set does not work either. I am wondering: Some of the groups that I am comparing contain different numbers of individuals. I don't think this should be a problem in statistical analysis but maybe it is in Arlequin?

ADD COMMENTlink written 3.3 years ago by akuepper0
0
gravatar for Fabio Marroni
3.3 years ago by
Fabio Marroni2.3k
Italy
Fabio Marroni2.3k wrote:

No, different size is not an issue.

I think the error is this that you didn't write the frequency of the genotype: imagine you have 3 SNPs. For each diploid sample you have to enter two lines, each reporting one the alleles of each of your 3 SNPs and, before the first series of SNPs, you have to write how many individuals have this genotype (in the example 1).
Like this

sample_a 1 A T C 
           C C T

If you work using IUPAC (which I never did) you only havbe one line for sample, but still you have to put the frequency, so in your case:

AZR10 1 G C C T A A

Hope this fixes the problem!

ADD COMMENTlink written 3.3 years ago by Fabio Marroni2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1858 users visited in the last hour