A Converter Of Sequence Data (Nexus, Phylip, Or Any Kind Of Sequence Data File) File To Haplotype Or 0/1 Infinite-Sites Data?
1
0
Entering edit mode
10.9 years ago

Hi,

I wonder if there are tools of converting sequence data to 0/1 infinite-sites data. I could make a script to do this, but I'd made one and forgotten it later. Now, I need to make one again, so I wonder if there are tools that people tend to use.

Thank you for your answers.

sequence haplotype conversion • 4.4k views
ADD COMMENT
1
Entering edit mode
10.9 years ago
David W 4.9k

I've used the R libraries pegas and ape to do this. Pegas provides the function haplotype to get the frequency of each unique seqeunce, which make it all straight forward

#example sequence data, use read.dna() to get sequences from file
> seq_data <- woodmouse[sample(1:15, 100, replace = TRUE), ]
> h <- haplotype(seq_data)

#turn the haplotype object into a 0/1 matrix
> tab <- sapply(attr(h, 'index'), function(i)
                  sapply(1:dim(seq_data)[1], function(j) sum(i==j)))
> head(tab[,1:5])
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    0    0    0    0
## [2,]    1    0    0    0    0
## [3,]    0    1    0    0    0
## [4,]    0    0    1    0    0
## [5,]    0    1    0    0    0
## [6,]    0    0    0    1    0

#rows are individuals, all should have one and only one haplotype
> all(rowSums(tab)==1)
##[1] TRUE

#label the rows with their sequence name
rownames(tab) <- labels(seq_data)

If you make this conversion a lot, it's easy to write R scripts that take command line arguments and the like

ADD COMMENT
0
Entering edit mode

I like that idea of using R packages.

ADD REPLY
0
Entering edit mode

Thank you for your answer.

ADD REPLY

Login before adding your answer.

Traffic: 2927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6