Question: Extracting subsequence from FASTA file using python
gravatar for shawnt1234
5 weeks ago by
shawnt12340 wrote:

Hi I would like to extract subsequences from a large fasta file and make a new fasta file with the extracted seqences using python preferably.

I have a csv file with the following format:

id, start, stop, header
id1, 3, 10, Contig0
id2, 12, 25, Contig1
id3, 19, 40, Contig2

the input fasta file has the following format:

(Contig0 sequence)
(Contig1 sequence)
(Contig2 sequence)

I would like an fasta file output that has the following format:

(Contig0 sequence from bp 3-10)
(Contig1 sequence from bp 12-25)
(Contig2 sequence from bp 19-40)

If anyone has any suggestions or a script that can do this, any help would be greatly appreciated.

sequence python fasta • 129 views
ADD COMMENTlink modified 5 weeks ago by Bastien Hervé660 • written 5 weeks ago by shawnt12340
gravatar for Bastien Hervé
5 weeks ago by
Limoges, CBRS, France
Bastien Hervé660 wrote:

It's possible in Biopython

1) Create a dataframe with your csv file (make your id column as index)

2) Iterate over your fasta file using SeqIO

3) For the record you get from your iteration, find the corresponding row in your dataframe (something like : df.loc[[]])

4) Once you have the good row, modify the header record with the row infos

5) Substring and replace the sequence record (record.sequence)

6) Write the record in a new file

7) Step3

I let you try this by your own, if you want some help comment below :)

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Bastien Hervé660

Thanks for the help! I wrote a script and it was not very efficient so it ran very slow, so I did some more research and found bedtools getfasta and that worked for me.

ADD REPLYlink written 5 weeks ago by shawnt12340
gravatar for genomax
5 weeks ago by
United States
genomax46k wrote:

pyfaidx by Matt Shirley.

ADD COMMENTlink written 5 weeks ago by genomax46k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 957 users visited in the last hour