Question: How to parse a multiple sequence alignment to quantify the number of SNPs in different regions
0
gravatar for modernsynthesis
5.4 years ago by
United States
modernsynthesis30 wrote:

Hello All,

I am new to programming and would like to figure out a way to count the number of SNPs inside and outside of a given probe-to-read match, separately. There are two main files I am working with: 1) a FASTA alignment for >2000 loci across 16 individuals and 2) a BLAST output table. Columns G and H from the BLAST table (see below) indicate where the probe matches the read from start to end (in base pairs). Any insight as to how I could tackle this problem would be greatly appreciated. 

 

Snippet from FASTA alignment:

>locus1_individual-1
------------------------------------AGAAAGACAACTCTAACACAGTGATATCCATTTCCATTAGGCAAATAAACAACTACAAATCATTCAGATTATGTAAATGCTCAATGATATTTGCCTCCCGGTATCGCAGAGTTAGGACTGTCCAAAGTTTTATAAGCATATTTTCATCAGAGAACCTACTAGAGCATTGATACATTTGATTTTCAGTTTTTTCTAATGACAAATCACCGTCTTGGAAGCAAAAAGGCAAATAACTGATTATAAGTGCAATGAAAGTCTCTTGAATTTACATATACATTTACTTGTACCCTGTCACATTTTCTTAACCTCTAAACCTCCACACAACATATAAATGAACTTGTCTGGAATTGTTTTTTTCCCAAATTGCAGCCTATATTTTCTGCAGCACTGATGCAATTATGCCACCTCAATTGCCTCGCAGTGGAAAAAAGCACAATGCAAAACTTAGTGTATCAAGAAAATAAAGCCCAGAAAATTGCCTGTGACACACACCAACACAAAGACAGCTACGCTTCCTCAGAAAGTGGACAGAATAAAAGTAGACAGTTTCTATTAACAATAACAAGACAAAATACAGAGGCCCTTTATCTCTTAAAATGTAAATTTATACAAGGATTCATGCTGCCCTGCTGTAAATATCAAATACCCCTTAAGACCATAGAACAAGAATGAATTGATTTTCCTCCTAGTCTTCGCCATCTGCAGATTCTCATAGCTTGGCATTATATC
>locus1_individual-2

TAGGTAGTGACTGATGAGAAGTGAACTGCATCCCACAGAAAGACAACTCTAACACAGTGATATCCATTTCCATTAGGCAAATAAACAACTACAAATCATTCAGATTATGTAAATGCTCAATGATATTTGCCTCCCGGTATCGCAGAGTTAGGACTGTCCAAAGTTTTATAAGCATATTTTCATCAGAGAACCTACTAGAGCATTGATACATTTGATTTTCAGTTTTTTCTAATGACAAATCACCGTCTTGGAAGCAAAAAGGCAAATAACTGATTATAAGTGCAATGAAAGTCTCTTGAATTTACATATACATTTACTTGTACCCTGTCACATTTTCTTAACCTCTAAACCTCCACACAACATATAAATGAACTTGTCTGGAATTGTTTTTTTCCCAAATTGCAGCCTATATTTTCTGCAGCACTGATGCAATTATGCCACCTCAATTGCCTCGCAGTGGAAAAAAGCACAATGCAAAACTTAGTGTATCAAGAAAATAAAGCCCAGAAAATTGCCTGTGACACACACCAACACAAAGACAGCTACGCTTCCTCAGAAAGTGGACAGAATAAAAGTAGACAGT--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[.....]

>locus1_individual-16

[.....]

>locus2000_individual-1

[.....]
 

 

Snippet from BLAST output:

A                   B               C   D   E   F   G   H   I   J   K           L
locus1_individual-1 locus1_probe1   100 120 0   0   240 359 1   120 3.00E-59    222
locus1_individual-2 locus1_probe1   100 120 0   0   276 395 1   120 3.00E-59    222

Probe matching locus 1:

>locus1_probe1
ATAAGTGCAATGAAAGTCTCTTGAATTTACATATACATTTACTTGTACCCTGTCACATTTTCTTAACCTCTAAACCTCCACACAACATATAAATGAACTTGTCTGGAATTGTTTTTTTCC

 

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by modernsynthesis30
0
gravatar for dmccoll
5.4 years ago by
dmccoll0
United States
dmccoll0 wrote:

Earlier today I did the galaxy 101 (@ https://usegalaxy.org/) tutorial and it seems they go over a simplified version of your problem using their web application.

Here's the link. https://usegalaxy.org/u/aun1/p/galaxy101

I apologize if I am missing a sublety in your problem, but either way this will not take long to investigate.

Good Hunting

ADD COMMENTlink written 5.4 years ago by dmccoll0
0
gravatar for modernsynthesis
5.4 years ago by
United States
modernsynthesis30 wrote:

Is this something I could do with sed and awk, or some other command-line tool? I will check out Galaxy, thanks.

ADD COMMENTlink written 5.4 years ago by modernsynthesis30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1617 users visited in the last hour