Question

New to Python. Need help writing a program comparing 2 DNA sequences.

1

Entering edit mode

9.1 years ago

Tdemarco ▴ 10

Hi all, I am new to this forum and very NEW to python. I was given the task of finding all the 5 amino acid sequences that are identical between two given DNA sequences. Like I said I am very new to this. I have a copy of Python for Biologists. Have read the chapters we were have supposed to read and have no idea where to begin. Any help or direction would be greatly appreciated.

python • 6.2k views

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.1 years ago by Tdemarco ▴ 10

1

Entering edit mode

Hi -- you may appreciate http://rosalind.info/problems/ini/, has some practical example python problems and solutions :)

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by Nancy Ouyang ▴ 170

0

Entering edit mode

You need to know how to open the files and read the contents. Then some for loop technique that helps in comparing the lines between the files and get common lines. Once this is done...

Read basics of Object oriented programming and follow some tutorial about biopython. You will learn how to read fasta files. Then compare the sequences in both the files.

ADD REPLY • link 9.1 years ago by GouthamAtla 12k

0

Entering edit mode

Thank you all. Doing some reading now and trying to make sense of it. This is a basic Bioinformatics course. We haven't really USED python in class yet.. (we have done other things)

It is hard to read material and be expected to write programs without practice.. Thanks for directing me to the right places.

ADD REPLY • link 9.1 years ago by Tdemarco ▴ 10

0

Entering edit mode

Actually, after downloading the files I see they are already in AA code.. Just have to find how many 5-mers are identical between the two sequences.

ADD REPLY • link 9.1 years ago by Tdemarco ▴ 10

score 0 · Answer 1 · 2015-03-06

1) Translate the nulceotide (DNA) sequences (simple bio-python solution).

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", generic_dna)
>>> coding_dna.translate()
Seq('MAIVMGR*KGAR*', HasStopCodon(ExtendedIUPACProtein(), '*'))

2) Use pattern matching to identify shared amino acid k-mers:

You could modify the following for that:

https://github.com/ffrancis/bioinformatic_algorithms/blob/master/codes/1_10_HammingDistance_pattern_match.py