Question: Biology undergrad taking bioinformatics new to python need HELP with problem!
1
gravatar for m.saola
2.7 years ago by
m.saola20
United States
m.saola20 wrote:

Hi I'm very new to python and I'm still struggling with this problem. For the problem, I already created a function called reverse_complement, which computes the reverse complement  of the input DNA sequence given.

Problem 3 (Virtual PCR):

You will write a program that performs "Virtual PCR" using pairs of PCR primers and a template sequence, much in the manner of UCSC's In-Silico PCR tool (http://genome.ucsc.edu/cgi-bin/hgPcr?command=start).

Your program (call it your_last_name_virtual_pcr.py) will load and assemble the pBR322 plasmid DNA sequence contained in hw_input\pBR322.txt - this will serve as your large target sequence.  Your program will also load pairs of PCR primer sequences contained in hw_input\pBR322_PCR_products.txt.  Note that each line consists of a forward primer sequence and a reverse primer sequence separated by a comma.  You will use the unaltered forward primer sequence and the reverse-complemented reverse primer sequence to find the start and end positions of the predicted PCR product and then extract its sequence.  Remember to import your reverse_complement function that you created in Problem 2 and use it to alter the reverse primer sequences.

Create an output file named hw_output\pBR322_PCR_products.txt and on each line write the forward primer, the reverse-complemented reverse primer and the predicted PCR product.  Separate the fields with commas.

pBR322 primers.txt- I read it in as one long string 

"TCGGGCTCGCCACTTCGGGCTCA,GAGTTGCATGATAAAGAAGACAGTCA
ATGGCCCGCTTTATCAGAAGCCAGACA,GTCAGTGAGCGAGGAAGCGGAAGAGCGC
AATCAGTGAGGCACCTATCTCAGCGATC,ACTCTAGCTTCCCGGCAACAATTAATAGA
CGGTGTGAAATACCGCACAGATGC,GAGCGAGGAAGCGGAAGAGCGCCTGATG"

 

My pseudo code I've written is: 1. read the input file/import the function 2. create a for loop in order to obtain the complement sequence 3. Then I 'cleaned' the list by turning it into a list of lists  so now it looks like this 

[['TCGGGCTCGCCACTTCGGGCTCA', 'GAGTTGCATGATAAAGAAGACAGTCA'], ['ATGGCCCGCTTTATCAGAAGCCAGACA', 'GTCAGTGAGCGAGGAAGCGGAAGAGCGC'], ['AATCAGTGAGGCACCTATCTCAGCGATC', 'ACTCTAGCTTCCCGGCAACAATTAATAGA'], ['CGGTGTGAAATACCGCACAGATGC', 'GAGCGAGGAAGCGGAAGAGCGCCTGATG']]

4. Now I have to use my reverse_complement function to change the reverse sequences (I bolded them for clarity) into reverse_complement reverse sequences but Im having trouble doing this 

fwd_and_rsvcomp_rsv_primers = []    
for pair in cleaned_primer_list:
    if pair[0]:
        t = pair[0]
        fwd_and_rsvcomp_rsv_primers.append(t)
    else:
        s = pair[1]
        fwd_and_rsvcomp_rsv_primers.append(s)
my output values on python shell gave me this (first string-foward sequence in each of the lists):

 ['TCGGGCTCGCCACTTCGGGCTCA', 'ATGGCCCGCTTTATCAGAAGCCAGACA', 'AATCAGTGAGGCACCTATCTCAGCGATC', 'CGGTGTGAAATACCGCACAGATGC']

Next I have to use the forward and reverse complemented reverse sequences and use to find the start and end positions of the predicted PCR product and then extract its sequence. But Im stuck at the step where I have to use the reverse complement function though. I have no idea how to solve the next step. Can anyone help?

 

 

 

 

 

 

 

python • 1.8k views
ADD COMMENTlink modified 2.7 years ago by Coryza360 • written 2.7 years ago by m.saola20
0
gravatar for Coryza
2.7 years ago by
Coryza360
Netherlands
Coryza360 wrote:

First, if you want to write a 'sexy' reverse complement function, you could do it much faster (http://crazyhottommy.blogspot.nl/2013/10/python-code-for-getting-reverse.html)

def ReverseComplement1(seq):

seq_dict = {'A':'T','T':'A','G':'C','C':'G'}

return "".join([seq_dict[base] for base in reversed(seq)])

After retrieving your forward primer + reverse complement primer per line in your input file, you could simply do DNASTRING[DNASTRING.index(forwardprimer):DNASTRING.index(reverseprimer)] if you only expect one PCR product. (Whereas DNASTRING represents open(*.txt, 'r').read().rstrip().replace('\n', '')).  I think i'ts easier to just read your input primer file line by line and then extract the PCR sequence meanwhile:

# Simplified - not - working - code - but - will - give - you - an -idea; 

dnaString = open(<file>, 'r').read()
primers = open(<file>)

for line in primers:

  forwardP = line.split(',')[0]

  reverseP = line.split(',')[1].rstrip().replace('\n', '') # Then reverse complement it.

  pcrProd = dnaString[dnaString.index(forwardP):dnaString.index(reverseP)] 

  <output>.write(forwardP+','+reverseP+','+pcrProd+'\n')

 

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Coryza360

Thank you for your efforts!

Sorry but I'm having trouble understanding your comment. I talked to my professor and she said I did problem 2 right (making the function). But for the most part how do I get from here

[['TCGGGCTCGCCACTTCGGGCTCA', 'GAGTTGCATGATAAAGAAGACAGTCA'], ['ATGGCCCGCTTTATCAGAAGCCAGACA', 'GTCAGTGAGCGAGGAAGCGGAAGAGCGC'], ['AATCAGTGAGGCACCTATCTCAGCGATC', 'ACTCTAGCTTCCCGGCAACAATTAATAGA'], ['CGGTGTGAAATACCGCACAGATGC', 'GAGCGAGGAAGCGGAAGAGCGCCTGATG']]

to here 

[['TCGGGCTCGCCACTTCGGGCTCA', 'TGACTGTCTTCTTTATCATGCAACTC'], ['ATGGCCCGCTTTATCAGAAGCCAGACA', 

'GCGCTCTTCCGCTTCCTCGCTCACTGAC'], ['AATCAGTGAGGCACCTATCTCAGCGATC', 'TCTATTAATTGTTGCCGGGAAGCTAGAGT'], 

['CGGTGTGAAATACCGCACAGATGC', 'CATCAGGCGCTCTTCCGCTTCCTCGCTC']]

The bolded sequences are the reverse complemented reverse sequences (the unbolded are forward primer sequences). From the latter list of sequences, will I be able to find the start and end positions of the predicted PCR product and then extract its sequence?

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by m.saola20

Key to understanding Coryza's answer is knowing what "index" does and also how a "for" loop lets you do the same thing to a bunch of things in a list.

Think of it like an assembly line.

The "for" loop lets you do the same thing over and over to similar objects.

If you're like me, it helps to imagine how you would do the task using a printout of the sequence and a pencil.

Sounds like you have a good grasp on the basic problem - what you want to achieve. Next step is to design a solution.

Hang in there. Programming is one of those things that helps you get better at solving problems. It can make you smarter overall.

 

 

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Ann2.2k

Thank you for the encouragement!

ADD REPLYlink written 2.7 years ago by m.saola20

Hi, sorry I was not clear enough.: To get from your list to reverse complements:

for item in <list>:

forwardPrimer = item[0]

reversePrimer = function(item[1]) #Where function represents your actual function name.

Your <reversefunction> should contain a 'return'' statement which returns the sequence:

def function(reversePrimer): #Where function represents your actual function name.

  reverse = reversePrimer

  # ... blablabla your magic here: primer to reverse complement.

  return <complementString>

function() #Where function represents your actual function name.

If this is working, you can easily add one extra line to the previous loop:

 pcrProd = dnaString[dnaString.index(forwardPrimer):dnaString.index(reversePrimer)] 

Where dnaString represents your input file (mentioned above). You can then write output (see above). As an addition to Ann's reply: index is a function that, in this case, gives the positions of the first detected forward primer (item[0]) and the first position of the reverse primer (item[1]) in the reference. The sequence that you get returned contains the forward primer (since that is your starting point). You'll need some more manipulation to calculate and remove your forward primer (length), and/or in the case of multiple/absence products. Does that help? (E.g. fP is found on position 8, the rP is found on position 302, your sequence is positions 8 till 302). PS: You could also work without lists (see previous posts, where you actually loop over lines in a file, instead of looping over your created list)

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Coryza360

I was able to resolve the issue with your help! Thank you so much!

ADD REPLYlink written 2.7 years ago by m.saola20

You're welcome ;)

ADD REPLYlink written 2.7 years ago by Coryza360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1176 users visited in the last hour