Question: extract list of positions from fasta file biopython
0
gravatar for s.i.lipworth
6 months ago by
s.i.lipworth0 wrote:

I have a list of positions of interest eg:

10
20
1000
4000000

I want to extract the base call at these positions from a fasta file using biopython. This is what I have tried:

query_dic ={}
with open(line) as pos_file:
                for x in pos_file:
                        for seq_record in SeqIO.parse(query_file, "fasta"):
                                nuc = seq_record[x] 
                                query_dic[x]=nuc
The error message says 'invalid index' - what is wrong?
biopython python • 281 views
ADD COMMENTlink written 6 months ago by s.i.lipworth0
2

Steps:

  1. read the positions as list
  2. iterate FASTA records:

    for seq_record in SeqIO.parse(query_file, "fasta"):
           for x in positions:
                  # get the base at position x
                  seq_record.seq[x-1]
    
ADD REPLYlink written 6 months ago by shenwei3563.4k

Firstly, you should get the right Chromosome; then extract the base from fasta sequence.

ADD REPLYlink written 6 months ago by Ben50

Does you FASTA file have one sequence in it, or many?

If one, you only need to open the FASTA file once, and you should use SeqIO.read for that.

If many, you need to know which sequence each of the values x refers to. Perhaps SeqIO.index would be useful here for loading the relevant record from a multiple sequence FASTA file?

ADD REPLYlink written 4 months ago by Peter5.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 645 users visited in the last hour