How to center SNP position from a tag pair and add flanking sequence using reference genome
2
0
Entering edit mode
8.5 years ago
grmunjal • 0

I have a fasta file containing tag pairs (63 identical nucleotides + 1 SNP) like so:

>TP2_query

CAGCAAAAAAAAAAAAAAAAAAATGAAACTTCTGAGAGGGAAAAATTACTCAACCTTATATTCT

>TP2_hit

CAGCAAAAAAAAAAAAAAAAAAATGTAACTTCTGAGAGGGAAAAATTACTCAACCTTATATTCT

I would like to produce 1 sequence per tag pair that is centered on the SNP (can be either allele) and flanked by 50 bp (coming from the reference genome) on either side. I'm thinking of this as a two step process:

(1) to determine the SNP position in the tag pair and the corresponding coordinates in the reference genome

(2) adding flanking sequence from the reference using the coordinates from (1)

Any ideas about (1) would be greatly appreciated.

I found bedtools flank is able to do (2) in the easiest possible way (maybe samtools faidx too). The command needs a "genome" file for the species (Medicago in my case). If I understand correctly, this file is simply tab delimited maximum numeric values of coordinates for each chromosome? I can figure out how to do this with some cheap python or R code but wondering if there's a neat unix solution for this.

SNP • 2.1k views
ADD COMMENT
0
Entering edit mode
ADD COMMENT
0
Entering edit mode
8.5 years ago
grmunjal • 0

For 1 thinking of adding mapping start position coordinate to within read position using something like (from https://github.com/enormandeau/Scripts/blob/master/uneak_locate_snp_positions.py ):

def find_snp_position(s1, s2):
    assert len(s1) == len(s2), "Sequences have different lengths"
    snps = []
    for i in xrange(len(s1)):
        if s1[i] != s2[i]:
            snps.append(i + 1)
            assert len(snps) > 0, "No SNP found"
            assert len(snps) <= 1, "More than one SNP found"
            return snps[0]
ADD COMMENT

Login before adding your answer.

Traffic: 3458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6