Question: Quick help: How can I find and replace a specific nucleotide within a gene sequence?
0
gravatar for mokunf
3 months ago by
mokunf0
mokunf0 wrote:

So I have a

usr/bin/python
from collections import defaultdict
import re, sys, random
from Bio import SeqI

- - - - - H E A D E R - - - - - - - - - - - - - - - - -

Objectives:
1. Read in a sequence
2. Find a specific segment of that sequence
3. Change a letter (mutation)
4. Output the sequence with the mutation

- - - - - U S E R V A R I A B L E S - - - - - - - -

mssg = " Search and Destroy"
genFile  = 'P1.txt'
inFile   = 'P1.txt'
inFolder = '.'
site   = ""  # what the mutation is 
outFile  = "Project1-Out.txt"
GenSeqs = defaultdict(lambda: "my own unknown" )
CT        = defaultdict(lambda: 'nada')
GeneSeqs   = defaultdict(lambda: 'nada')
CX  = defaultdict(lambda: 0)
count = defaultdict(lambda: 0)
NTs    = ['a', 't', 'g', 'c']
afreq        = defaultdict(lambda: -1.1 )
tfreq        = defaultdict(lambda: -1.1 )
gfreq        = defaultdict(lambda: -1.1 )
cfreq       = defaultdict(lambda: -1.1 )
seq = 'P1.txt'

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- - - - - M A I N - - - - - - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

print("\n\n", mssg, ". . . . ")

IN1 = open( "P1.txt", 'r')
number = 1
for line in IN1:
    if (re.match('>', line)):
        header = line.rstrip()   # remove right white space
    else:
        GenSeqs[header] = line.rstrip()   # Dict[key] = value, key = header, value = sequence
        number += 0

print("There are %d gene sequences in file %s" % (number, inFile))


for records in SeqIO.parse ('P1.txt','fasta'):
    if 'a' in records.seq:
        print (records)

rna-seq sequence genome gene • 328 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by mokunf0
5

To answer your question, you can use CRISPR.

ADD REPLYlink written 3 months ago by theobroma221.1k
1

I prefer CRISPR with vim bindings.

ADD REPLYlink written 3 months ago by kloetzl950
2

I'm a big fan of cRispr, crispy, and crispl

ADD REPLYlink written 3 months ago by WouterDeCoster27k
1

Format your code properly so we can easily read it and be more specific with your question and you'll get a lot more help.

ADD REPLYlink written 3 months ago by jared.andrews07680

I have tidied it a fair bit to how [I believe..] it should look

ADD REPLYlink written 3 months ago by Kevin Blighe16k
1

Now we need to figure out what the actual question is. So much for quick help...

ADD REPLYlink modified 3 months ago • written 3 months ago by WouterDeCoster27k

It does not appear that you are searching for a pattern but replacing in a specific location. Do you need to use a python program for this (unless this is an assignment).

ADD REPLYlink written 3 months ago by genomax46k
0
gravatar for mokunf
3 months ago by
mokunf0
mokunf0 wrote:

I apologize for not responding sooner. I'm not sure how to enter the script that I've created so far. Here it is again, with the sequence that i'm trying to analyze. Below the code is the sequence that I'm focusing on. I've installed Biopython and it seems to be helpful in my analysis. I could be mistaken as I am still new to this. Any help that's offered is greatly appreciated.

#!/usr/bin/python
 input_file = open('P1.txt', 'r')
OUT = open('P1-Out.txt','w') 
OUT.write('Gene_Name\tA\tC\tG\tT\tTotal_Length\tCG%\n') 
from Bio import SeqIO
from Bio import IUPAC

 print ("......... START........... ")

for cur_record in SeqIO.parse(input_file, "fasta") :
#count nucleotides in this record...
    gene_name = cur_record.name 
   A_count = cur_record.seq.count('a') 
   C_count = cur_record.seq.count('c') 
   G_count = cur_record.seq.count('g') 
   T_count = cur_record.seq.count('t') 
   length = len(cur_record.seq) 
   cg_percentage = 100 * float(C_count + G_count) / length

print ("A_Count %d" % A_count)
print ("C_Count %d" % C_count)
print ("G_Count %d" % G_count)
print ("T_Count %d" % T_count)
print ("CG_Percentage %d" % cg_percentage)

cur_seq = SeqIO('input_file', IUPAC.unambiguous_gene)
cur_seq.alphabet

counter = 0
for cur_record in SeqIO.parse(input_file, "fasta"):
counter+= cur_record.seq.count ('a')


print ("............. FINISH .............")
OUT.write('%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % (gene_name, A_count, C_count, G_count, T_count, length, 
       cg_percentage))
 OUT.close()

Example sequence

>reverse translation of sp|P68871|HBB_HUMAN Hemoglobin subunit beta OS=Homo sapiens GN=HBB PE=1 SV=2 to a 441 base sequence of most likely codons.
atggtgcatctgaccccggaagaaaaaagcgcggtgaccgcgctgtggggcaaagtgaac
gtggatgaagtgggcggcgaagcgctgggccgcctgctggtggtgtatccgtggacccag
cgcttttttgaaagctttggcgatctgagcaccccggatgcggtgatgggcaacccgaaa
gtgaaagcgcatggcaaaaaagtgctgggcgcgtttagcgatggcctggcgcatctggat
aacctgaaaggcacctttgcgaccctgagcgaactgcattgcgataaactgcatgtggat
ccggaaaactttcgcctgctgggcaacgtgctggtgtgcgtgctggcgcatcattttggc
aaagaatttaccccgccggtgcaggcggcgtatcagaaagtggtggcgggcgtggcgaac
gcgctggcgcataaatatcat
ADD COMMENTlink modified 3 months ago by genomax46k • written 3 months ago by mokunf0

How do you specify which location from this sequence needs to be changed? Do you always know the length of input sequence in advance or is this location relative to something or is it based on a motif/pattern?

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 752 users visited in the last hour