Question: Quick help: How can I find and replace a specific nucleotide within a gene sequence?
0
gravatar for mokunf
7 days ago by
mokunf0
mokunf0 wrote:

So I have a

usr/bin/python
from collections import defaultdict
import re, sys, random
from Bio import SeqI

- - - - - H E A D E R - - - - - - - - - - - - - - - - -

Objectives:
1. Read in a sequence
2. Find a specific segment of that sequence
3. Change a letter (mutation)
4. Output the sequence with the mutation

- - - - - U S E R V A R I A B L E S - - - - - - - -

mssg = " Search and Destroy"
genFile  = 'P1.txt'
inFile   = 'P1.txt'
inFolder = '.'
site   = ""  # what the mutation is 
outFile  = "Project1-Out.txt"
GenSeqs = defaultdict(lambda: "my own unknown" )
CT        = defaultdict(lambda: 'nada')
GeneSeqs   = defaultdict(lambda: 'nada')
CX  = defaultdict(lambda: 0)
count = defaultdict(lambda: 0)
NTs    = ['a', 't', 'g', 'c']
afreq        = defaultdict(lambda: -1.1 )
tfreq        = defaultdict(lambda: -1.1 )
gfreq        = defaultdict(lambda: -1.1 )
cfreq       = defaultdict(lambda: -1.1 )
seq = 'P1.txt'

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- - - - - M A I N - - - - - - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

print("\n\n", mssg, ". . . . ")

IN1 = open( "P1.txt", 'r')
number = 1
for line in IN1:
    if (re.match('>', line)):
        header = line.rstrip()   # remove right white space
    else:
        GenSeqs[header] = line.rstrip()   # Dict[key] = value, key = header, value = sequence
        number += 0

print("There are %d gene sequences in file %s" % (number, inFile))


for records in SeqIO.parse ('P1.txt','fasta'):
    if 'a' in records.seq:
        print (records)

rna-seq sequence genome gene • 178 views
ADD COMMENTlink modified 2 days ago • written 7 days ago by mokunf0
5

To answer your question, you can use CRISPR.

ADD REPLYlink written 6 days ago by theobroma221.1k
1

I prefer CRISPR with vim bindings.

ADD REPLYlink written 6 days ago by kloetzl750
2

I'm a big fan of cRispr, crispy, and crispl

ADD REPLYlink written 6 days ago by WouterDeCoster24k
1

Format your code properly so we can easily read it and be more specific with your question and you'll get a lot more help.

ADD REPLYlink written 6 days ago by jared.andrews07540

I have tidied it a fair bit to how [I believe..] it should look

ADD REPLYlink written 6 days ago by Kevin Blighe11k
1

Now we need to figure out what the actual question is. So much for quick help...

ADD REPLYlink modified 6 days ago • written 6 days ago by WouterDeCoster24k

It does not appear that you are searching for a pattern but replacing in a specific location. Do you need to use a python program for this (unless this is an assignment).

ADD REPLYlink written 6 days ago by genomax40k
0
gravatar for mokunf
2 days ago by
mokunf0
mokunf0 wrote:

I apologize for not responding sooner. I'm not sure how to enter the script that I've created so far. Here it is again, with the sequence that i'm trying to analyze. Below the code is the sequence that I'm focusing on. I've installed Biopython and it seems to be helpful in my analysis. I could be mistaken as I am still new to this. Any help that's offered is greatly appreciated.

#!/usr/bin/python
 input_file = open('P1.txt', 'r')
OUT = open('P1-Out.txt','w') 
OUT.write('Gene_Name\tA\tC\tG\tT\tTotal_Length\tCG%\n') 
from Bio import SeqIO
from Bio import IUPAC

 print ("......... START........... ")

for cur_record in SeqIO.parse(input_file, "fasta") :
#count nucleotides in this record...
    gene_name = cur_record.name 
   A_count = cur_record.seq.count('a') 
   C_count = cur_record.seq.count('c') 
   G_count = cur_record.seq.count('g') 
   T_count = cur_record.seq.count('t') 
   length = len(cur_record.seq) 
   cg_percentage = 100 * float(C_count + G_count) / length

print ("A_Count %d" % A_count)
print ("C_Count %d" % C_count)
print ("G_Count %d" % G_count)
print ("T_Count %d" % T_count)
print ("CG_Percentage %d" % cg_percentage)

cur_seq = SeqIO('input_file', IUPAC.unambiguous_gene)
cur_seq.alphabet

counter = 0
for cur_record in SeqIO.parse(input_file, "fasta"):
counter+= cur_record.seq.count ('a')


print ("............. FINISH .............")
OUT.write('%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % (gene_name, A_count, C_count, G_count, T_count, length, 
       cg_percentage))
 OUT.close()

Example sequence

>reverse translation of sp|P68871|HBB_HUMAN Hemoglobin subunit beta OS=Homo sapiens GN=HBB PE=1 SV=2 to a 441 base sequence of most likely codons.
atggtgcatctgaccccggaagaaaaaagcgcggtgaccgcgctgtggggcaaagtgaac
gtggatgaagtgggcggcgaagcgctgggccgcctgctggtggtgtatccgtggacccag
cgcttttttgaaagctttggcgatctgagcaccccggatgcggtgatgggcaacccgaaa
gtgaaagcgcatggcaaaaaagtgctgggcgcgtttagcgatggcctggcgcatctggat
aacctgaaaggcacctttgcgaccctgagcgaactgcattgcgataaactgcatgtggat
ccggaaaactttcgcctgctgggcaacgtgctggtgtgcgtgctggcgcatcattttggc
aaagaatttaccccgccggtgcaggcggcgtatcagaaagtggtggcgggcgtggcgaac
gcgctggcgcataaatatcat
ADD COMMENTlink modified 2 days ago by genomax40k • written 2 days ago by mokunf0

How do you specify which location from this sequence needs to be changed? Do you always know the length of input sequence in advance or is this location relative to something or is it based on a motif/pattern?

ADD REPLYlink modified 2 days ago • written 2 days ago by genomax40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 816 users visited in the last hour