Question: Quick help: How can I find and replace a specific nucleotide within a gene sequence?
0
gravatar for mokunf
12 months ago by
mokunf0
mokunf0 wrote:

So I have a

usr/bin/python
from collections import defaultdict
import re, sys, random
from Bio import SeqI

- - - - - H E A D E R - - - - - - - - - - - - - - - - -

Objectives:
1. Read in a sequence
2. Find a specific segment of that sequence
3. Change a letter (mutation)
4. Output the sequence with the mutation

- - - - - U S E R V A R I A B L E S - - - - - - - -

mssg = " Search and Destroy"
genFile  = 'P1.txt'
inFile   = 'P1.txt'
inFolder = '.'
site   = ""  # what the mutation is 
outFile  = "Project1-Out.txt"
GenSeqs = defaultdict(lambda: "my own unknown" )
CT        = defaultdict(lambda: 'nada')
GeneSeqs   = defaultdict(lambda: 'nada')
CX  = defaultdict(lambda: 0)
count = defaultdict(lambda: 0)
NTs    = ['a', 't', 'g', 'c']
afreq        = defaultdict(lambda: -1.1 )
tfreq        = defaultdict(lambda: -1.1 )
gfreq        = defaultdict(lambda: -1.1 )
cfreq       = defaultdict(lambda: -1.1 )
seq = 'P1.txt'

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- - - - - M A I N - - - - - - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

print("\n\n", mssg, ". . . . ")

IN1 = open( "P1.txt", 'r')
number = 1
for line in IN1:
    if (re.match('>', line)):
        header = line.rstrip()   # remove right white space
    else:
        GenSeqs[header] = line.rstrip()   # Dict[key] = value, key = header, value = sequence
        number += 0

print("There are %d gene sequences in file %s" % (number, inFile))


for records in SeqIO.parse ('P1.txt','fasta'):
    if 'a' in records.seq:
        print (records)

rna-seq sequence genome gene • 598 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by mokunf0
5

To answer your question, you can use CRISPR.

ADD REPLYlink written 12 months ago by theobroma221.1k
1

I prefer CRISPR with vim bindings.

ADD REPLYlink written 12 months ago by kloetzl990
2

I'm a big fan of cRispr, crispy, and crispl

ADD REPLYlink written 12 months ago by WouterDeCoster35k
1

Format your code properly so we can easily read it and be more specific with your question and you'll get a lot more help.

ADD REPLYlink written 12 months ago by jared.andrews071.8k

I have tidied it a fair bit to how [I believe..] it should look

ADD REPLYlink written 12 months ago by Kevin Blighe35k
1

Now we need to figure out what the actual question is. So much for quick help...

ADD REPLYlink modified 12 months ago • written 12 months ago by WouterDeCoster35k

It does not appear that you are searching for a pattern but replacing in a specific location. Do you need to use a python program for this (unless this is an assignment).

ADD REPLYlink written 12 months ago by genomax60k
0
gravatar for mokunf
12 months ago by
mokunf0
mokunf0 wrote:

I apologize for not responding sooner. I'm not sure how to enter the script that I've created so far. Here it is again, with the sequence that i'm trying to analyze. Below the code is the sequence that I'm focusing on. I've installed Biopython and it seems to be helpful in my analysis. I could be mistaken as I am still new to this. Any help that's offered is greatly appreciated.

#!/usr/bin/python
 input_file = open('P1.txt', 'r')
OUT = open('P1-Out.txt','w') 
OUT.write('Gene_Name\tA\tC\tG\tT\tTotal_Length\tCG%\n') 
from Bio import SeqIO
from Bio import IUPAC

 print ("......... START........... ")

for cur_record in SeqIO.parse(input_file, "fasta") :
#count nucleotides in this record...
    gene_name = cur_record.name 
   A_count = cur_record.seq.count('a') 
   C_count = cur_record.seq.count('c') 
   G_count = cur_record.seq.count('g') 
   T_count = cur_record.seq.count('t') 
   length = len(cur_record.seq) 
   cg_percentage = 100 * float(C_count + G_count) / length

print ("A_Count %d" % A_count)
print ("C_Count %d" % C_count)
print ("G_Count %d" % G_count)
print ("T_Count %d" % T_count)
print ("CG_Percentage %d" % cg_percentage)

cur_seq = SeqIO('input_file', IUPAC.unambiguous_gene)
cur_seq.alphabet

counter = 0
for cur_record in SeqIO.parse(input_file, "fasta"):
counter+= cur_record.seq.count ('a')


print ("............. FINISH .............")
OUT.write('%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % (gene_name, A_count, C_count, G_count, T_count, length, 
       cg_percentage))
 OUT.close()

Example sequence

>reverse translation of sp|P68871|HBB_HUMAN Hemoglobin subunit beta OS=Homo sapiens GN=HBB PE=1 SV=2 to a 441 base sequence of most likely codons.
atggtgcatctgaccccggaagaaaaaagcgcggtgaccgcgctgtggggcaaagtgaac
gtggatgaagtgggcggcgaagcgctgggccgcctgctggtggtgtatccgtggacccag
cgcttttttgaaagctttggcgatctgagcaccccggatgcggtgatgggcaacccgaaa
gtgaaagcgcatggcaaaaaagtgctgggcgcgtttagcgatggcctggcgcatctggat
aacctgaaaggcacctttgcgaccctgagcgaactgcattgcgataaactgcatgtggat
ccggaaaactttcgcctgctgggcaacgtgctggtgtgcgtgctggcgcatcattttggc
aaagaatttaccccgccggtgcaggcggcgtatcagaaagtggtggcgggcgtggcgaac
gcgctggcgcataaatatcat
ADD COMMENTlink modified 12 months ago by genomax60k • written 12 months ago by mokunf0

How do you specify which location from this sequence needs to be changed? Do you always know the length of input sequence in advance or is this location relative to something or is it based on a motif/pattern?

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax60k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1629 users visited in the last hour