Question: Multiple Sequence Alignment for very large data sets
gravatar for miles.thorburn
13 months ago by
miles.thorburn80 wrote:

What would your recommendations for the best MSA programs for comparing large nucleotide sequences of the same species?

My data set consists of 66 individuals from 11 populations. I will be using VariScan to scan for selection next, but need to realign each of the 21 chromosomes separately for each; population (6 individuals x 11 replicates), population pair (12 individuals in 5 replicates, excluding the outgroup), ecotype (30 individuals x 2 ecotypes), and all individuals together (66 individuals). To give you an idea of the size, chromosome 1 is ~26Mb, in a single consensus fasta sequence.

I was using GUIDANCE with the PRANK algorithm (commonplace in phylogenomic studies), but even on a high performance computing cluster, it took too long for one chromosome with only 6 individuals to make it a feasible approach. I am currently doing a test using MAFFT, but I have been warned it may have a sequence size limitation - it's currently running so we'll see.

Thanks in advance, and if I missed explaining anything, please let me know!

msa alignment • 747 views
ADD COMMENTlink written 13 months ago by miles.thorburn80

What is the scientific logic of doing a chromosome level MSA? How was this consensus sequence generated? What is the ultimate question you are trying to solve?

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax64k

Ultimately this is a genome wide scan for balancing selection. We would do it at the whole genome level, but in using each individual chromosome separately, we cut down the computational power needed and to cut down the amount of time needed for each step. Make sense?

ADD REPLYlink written 13 months ago by miles.thorburn80

I assume you are reasonably sure that your chromosomes are directly align-able. By extending your method, if chromosome level alignments are not feasible, you may have to start dividing the problem into smaller pieces. A casual search showed that people seem to be studying balancing selection at the level of a few genes. Are there known studies for doing this on whole genome level?

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax64k

There aren't many, but off the top of my head I can think of 2 good examples. We are trying to get away from the candidate gene approach here, and our data set is outstanding.

I am reasonably sure each chromosome should be aligned without any problems.

ADD REPLYlink written 13 months ago by miles.thorburn80

Assuming the logic/experiment is all sound - try a different tool.

Ones that tend to be good for large scale:

  • progressiveMauve
  • Kalign
ADD REPLYlink written 13 months ago by jrj.healey11k

Thanks, I'll have a look into these now.

ADD REPLYlink written 13 months ago by miles.thorburn80
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 905 users visited in the last hour