Question

Closed:Need Help to Identify my mistake in my code

0

Entering edit mode

6.6 years ago

ishmahe16 • 0

Hey I have to find the FASTA file containing the sequence 1kb upstream of each gene on the X chromosome. Given: a GFF and a genome sequence file in fasta format, The code I used to solve is

from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

fast_lgx_records = []
lgs_record_ids = []
records = list(SeqIO.parse(open("Midterm.fna"), 'fasta'))
for record in records:
    if "LGX" in record.description:
        fast_lgx_records.append(record)
        lgs_record_ids.appendrecord.id)

            positions_to_read = []
              with open("Midterm.gff") as f:
    for line in f:
        if not line.startswith("#"):
            split_line = line.split('\t')
            seq_id = split_line[0]
            feature_type = split_line[2]
            start = split_line[3]
            end = split_line[4]
            sign = split_line[6]
            if seq_id in lgs_record_ids and feature_type == "gene":
                if sign == "+":
                    start_index = int(start)
                    positions_to_read.append((start_index - 1001, start_index - 1))
                else:
                    end_index = int(end)
                    positions_to_read.append((end_index + 1, end_index + 1001))

                   # write the final sequence to a new fasta file
                final_data = []

My professor mentioned that : All the headers are identical so I can't match sequences to their corresponding genes or positions in the genome. There are incorrect sequences, but I can't be sure exactly the problem because I can't tell which gene they are supposed to belong to.

Please help.

gene python • 146 views

ADD COMMENT • link updated 6.6 years ago by GenoMax 152k • written 6.6 years ago by ishmahe16 • 0