Question: How do I extract locus-position information from XMFA file? Start-end positions of each locus.
0
gravatar for jbt38
11 months ago by
jbt380
jbt380 wrote:

I have an XMFA alignment of >400 taxa, with 2,000 loci. I need to find where each locus begins and ends, as in a partition file. Next step I will extract the single-locus alignments for dN/dS from the fasta-version of this genome alignment.

Is there a quick way to find start-end positions for each gene? Like a partition file for a phylogeny reconstruction. It takes too long to go through one by one by Ctrl+F.

ADD COMMENTlink modified 11 months ago by massa.kassa.sc3na260 • written 11 months ago by jbt380
0
gravatar for massa.kassa.sc3na
11 months ago by
massa.kassa.sc3na260 wrote:

Hi,

I assume that you are talking about file like this: http://darlinglab.org/mauve/user-guide/files.html (I took the example as a SAMPLE_FILE)

>seq_num:start1-end1 ± comments (sequence name, etc.)

Do you need to extract the seq_num and the start1, end1 indices? This can be done with grep and awk combo (you need terminal with grep and awk programs).

For that it would be:

grep ">" SAMPLE_FILE | awk -F'>|> |:|-| ' -v OFS=', ' '{print $2, $3, $4}'
seq_num, start1, end1
seq_num, startN, endN
seq_num, start1, end1
seq_num, startN, endN

Best regards

ADD COMMENTlink written 11 months ago by massa.kassa.sc3na260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1471 users visited in the last hour