Biology undergrad taking bioinformatics new to python need HELP with problem!
1
1
Entering edit mode
9.2 years ago
m.saola ▴ 20

Hi I'm very new to python and I'm still struggling with this problem. For the problem, I already created a function called reverse_complement, which computes the reverse complement of the input DNA sequence given.

Problem 3 (Virtual PCR):

You will write a program that performs "Virtual PCR" using pairs of PCR primers and a template sequence, much in the manner of UCSC's In-Silico PCR tool (http://genome.ucsc.edu/cgi-bin/hgPcr?command=start).

Your program (call it your_last_name_virtual_pcr.py) will load and assemble the pBR322 plasmid DNA sequence contained in hw_input\pBR322.txt - this will serve as your large target sequence. Your program will also load pairs of PCR primer sequences contained in hw_input\pBR322_PCR_products.txt. Note that each line consists of a forward primer sequence and a reverse primer sequence separated by a comma. You will use the unaltered forward primer sequence and the reverse-complemented reverse primer sequence to find the start and end positions of the predicted PCR product and then extract its sequence. Remember to import your reverse_complement function that you created in Problem 2 and use it to alter the reverse primer sequences.

Create an output file named hw_output\pBR322_PCR_products.txt and on each line write the forward primer, the reverse-complemented reverse primer and the predicted PCR product. Separate the fields with commas.

pBR322 primers.txt- I read it in as one long string

"TCGGGCTCGCCACTTCGGGCTCA,GAGTTGCATGATAAAGAAGACAGTCA

ATGGCCCGCTTTATCAGAAGCCAGACA,GTCAGTGAGCGAGGAAGCGGAAGAGCGC AATCAGTGAGGCACCTATCTCAGCGATC,ACTCTAGCTTCCCGGCAACAATTAATAGA CGGTGTGAAATACCGCACAGATGC,GAGCGAGGAAGCGGAAGAGCGCCTGATG"

My pseudo code I've written is:

1. Read the input file/import the function

2. Create a for loop in order to obtain the complement sequence

3. Then I 'cleaned' the list by turning it into a list of lists so now it looks like this

[['TCGGGCTCGCCACTTCGGGCTCA', 'GAGTTGCATGATAAAGAAGACAGTCA'], ['ATGGCCCGCTTTATCAGAAGCCAGACA', 'GTCAGTGAGCGAGGAAGCGGAAGAGCGC'], ['AATCAGTGAGGCACCTATCTCAGCGATC', 'ACTCTAGCTTCCCGGCAACAATTAATAGA'], ['CGGTGTGAAATACCGCACAGATGC', 'GAGCGAGGAAGCGGAAGAGCGCCTGATG']]

4. Now I have to use my reverse_complement function to change the reverse sequences (every other sequence starting from the second sequence) into reverse_complement reverse sequences but I'm having trouble doing this

fwd_and_rsvcomp_rsv_primers = []
for pair in cleaned_primer_list:
    if pair[0]:
        t = pair[0]
        fwd_and_rsvcomp_rsv_primers.append(t)
    else:
        s = pair[1]
        fwd_and_rsvcomp_rsv_primers.append(s)

My output values on python shell gave me this (first string-foward sequence in each of the lists):

['TCGGGCTCGCCACTTCGGGCTCA', 'ATGGCCCGCTTTATCAGAAGCCAGACA', 'AATCAGTGAGGCACCTATCTCAGCGATC', 'CGGTGTGAAATACCGCACAGATGC']

Next I have to use the forward and reverse complemented reverse sequences and use to find the start and end positions of the predicted PCR product and then extract its sequence. But I'm stuck at the step where I have to use the reverse complement function though. I have no idea how to solve the next step. Can anyone help?

python • 5.0k views
ADD COMMENT
0
Entering edit mode
9.2 years ago
Coryza ▴ 430

First, if you want to write a 'sexy' reverse complement function, you could do it much faster (http://crazyhottommy.blogspot.nl/2013/10/python-code-for-getting-reverse.html).

def ReverseComplement1(seq):

    seq_dict = {'A':'T','T':'A','G':'C','C':'G'}
    return "".join([seq_dict[base] for base in reversed(seq)])

After retrieving your forward primer + reverse complement primer per line in your input file, you could simply do DNASTRING[DNASTRING.index(forwardprimer):DNASTRING.index(reverseprimer)] if you only expect one PCR product. (Whereas DNASTRING represents open(*.txt, 'r').read().rstrip().replace('\n', '')). I think it's easier to just read your input primer file line by line and then extract the PCR sequence meanwhile:

# Simplified - not - working - code - but - will - give - you - an -idea; 

dnaString = open(<file>, 'r').read()
primers = open(<file>)

for line in primers:
  forwardP = line.split(',')[0]
  reverseP = line.split(',')[1].rstrip().replace('\n', '') # Then reverse complement it.
  pcrProd = dnaString[dnaString.index(forwardP):dnaString.index(reverseP)] 
  <output>.write(forwardP+','+reverseP+','+pcrProd+'\n')
ADD COMMENT
0
Entering edit mode

Thank you for your efforts!

Sorry but I'm having trouble understanding your comment. I talked to my professor and she said I did problem 2 right (making the function). But for the most part how do I get from here

[['TCGGGCTCGCCACTTCGGGCTCA', 'GAGTTGCATGATAAAGAAGACAGTCA'], ['ATGGCCCGCTTTATCAGAAGCCAGACA', 'GTCAGTGAGCGAGGAAGCGGAAGAGCGC'], ['AATCAGTGAGGCACCTATCTCAGCGATC', 'ACTCTAGCTTCCCGGCAACAATTAATAGA'], ['CGGTGTGAAATACCGCACAGATGC', 'GAGCGAGGAAGCGGAAGAGCGCCTGATG']]

to here

[['TCGGGCTCGCCACTTCGGGCTCA', 'TGACTGTCTTCTTTATCATGCAACTC'], ['ATGGCCCGCTTTATCAGAAGCCAGACA', 'GCGCTCTTCCGCTTCCTCGCTCACTGAC'], ['AATCAGTGAGGCACCTATCTCAGCGATC', 'TCTATTAATTGTTGCCGGGAAGCTAGAGT'],
['CGGTGTGAAATACCGCACAGATGC', 'CATCAGGCGCTCTTCCGCTTCCTCGCTC']]

Starting with the second sequence, every other sequence is the reverse complemented reverse sequences of the corresponding previous forward primer sequences. From the latter list of sequences, will I be able to find the start and end positions of the predicted PCR product and then extract its sequence?

ADD REPLY
0
Entering edit mode

Key to understanding Coryza's answer is knowing what "index" does and also how a "for" loop lets you do the same thing to a bunch of things in a list.

Think of it like an assembly line.

The "for" loop lets you do the same thing over and over to similar objects.

If you're like me, it helps to imagine how you would do the task using a printout of the sequence and a pencil.

Sounds like you have a good grasp on the basic problem - what you want to achieve. Next step is to design a solution.

Hang in there. Programming is one of those things that helps you get better at solving problems. It can make you smarter overall.

ADD REPLY
0
Entering edit mode

Thank you for the encouragement!

ADD REPLY
0
Entering edit mode

Hi, sorry I was not clear enough. To get from your list to reverse complements:

for item in <list>:
    forwardPrimer = item[0]
    reversePrimer = function(item[1]) #Where function represents your actual function name.

Your <reversefunction> should contain a return statement which returns the sequence:

def function(reversePrimer): #Where function represents your actual function name.
  reverse = reversePrimer
  # ... blablabla your magic here: primer to reverse complement.
  return <complementString>

function() #Where function represents your actual function name.

If this is working, you can easily add one extra line to the previous loop:

 pcrProd = dnaString[dnaString.index(forwardPrimer):dnaString.index(reversePrimer)]

Where dnaString represents your input file (mentioned above). You can then write output (see above). As an addition to Ann's reply: index is a function that, in this case, gives the positions of the first detected forward primer (item[0]) and the first position of the reverse primer (item[1]) in the reference. The sequence that you get returned contains the forward primer (since that is your starting point). You'll need some more manipulation to calculate and remove your forward primer (length), and/or in the case of multiple/absence products. Does that help? (E.g. fP is found on position 8, the rP is found on position 302, your sequence is positions 8 till 302). PS: You could also work without lists (see previous posts, where you actually loop over lines in a file, instead of looping over your created list)

ADD REPLY
0
Entering edit mode

I was able to resolve the issue with your help! Thank you so much!

ADD REPLY
0
Entering edit mode

You're welcome ;)

ADD REPLY

Login before adding your answer.

Traffic: 1968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6