Closed:Need Help to Identify my mistake in my code
0
0
Entering edit mode
5.4 years ago
ishmahe16 • 0

Hey I have to find the FASTA file containing the sequence 1kb upstream of each gene on the X chromosome. Given: a GFF and a genome sequence file in fasta format, The code I used to solve is

from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

fast_lgx_records = []
lgs_record_ids = []
records = list(SeqIO.parse(open("Midterm.fna"), 'fasta'))
for record in records:
    if "LGX" in record.description:
        fast_lgx_records.append(record)
        lgs_record_ids.appendrecord.id)

            positions_to_read = []
              with open("Midterm.gff") as f:
    for line in f:
        if not line.startswith("#"):
            split_line = line.split('\t')
            seq_id = split_line[0]
            feature_type = split_line[2]
            start = split_line[3]
            end = split_line[4]
            sign = split_line[6]
            if seq_id in lgs_record_ids and feature_type == "gene":
                if sign == "+":
                    start_index = int(start)
                    positions_to_read.append((start_index - 1001, start_index - 1))
                else:
                    end_index = int(end)
                    positions_to_read.append((end_index + 1, end_index + 1001))

                   # write the final sequence to a new fasta file
                final_data = []

My professor mentioned that : All the headers are identical so I can't match sequences to their corresponding genes or positions in the genome. There are incorrect sequences, but I can't be sure exactly the problem because I can't tell which gene they are supposed to belong to.

Please help.

gene python • 138 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 3022 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6