Question: How do I extract locus-position information from XMFA file? Start-end positions of each locus.
0
gravatar for jbt38
11 days ago by
jbt380
jbt380 wrote:

I have an XMFA alignment of >400 taxa, with 2,000 loci. I need to find where each locus begins and ends, as in a partition file. Next step I will extract the single-locus alignments for dN/dS from the fasta-version of this genome alignment.

Is there a quick way to find start-end positions for each gene? Like a partition file for a phylogeny reconstruction. It takes too long to go through one by one by Ctrl+F.

ADD COMMENTlink modified 10 days ago by massa.kassa.sc3na40 • written 11 days ago by jbt380
0
gravatar for massa.kassa.sc3na
10 days ago by
massa.kassa.sc3na40 wrote:

Hi,

I assume that you are talking about file like this: http://darlinglab.org/mauve/user-guide/files.html (I took the example as a SAMPLE_FILE)

>seq_num:start1-end1 ± comments (sequence name, etc.)

Do you need to extract the seq_num and the start1, end1 indices? This can be done with grep and awk combo (you need terminal with grep and awk programs).

For that it would be:

grep ">" SAMPLE_FILE | awk -F'>|> |:|-| ' -v OFS=', ' '{print $2, $3, $4}'
seq_num, start1, end1
seq_num, startN, endN
seq_num, start1, end1
seq_num, startN, endN

Best regards

ADD COMMENTlink written 10 days ago by massa.kassa.sc3na40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 900 users visited in the last hour