Question: Extracting subsequence from FASTA file using python
0
gravatar for shawnt1234
4 months ago by
shawnt12340
shawnt12340 wrote:

Hi I would like to extract subsequences from a large fasta file and make a new fasta file with the extracted seqences using python preferably.

I have a csv file with the following format:

id, start, stop, header
id1, 3, 10, Contig0
id2, 12, 25, Contig1
id3, 19, 40, Contig2

the input fasta file has the following format:

>Contig0
(Contig0 sequence)
>Contig1
(Contig1 sequence)
>Contig2
(Contig2 sequence)

I would like an fasta file output that has the following format:

>id1
(Contig0 sequence from bp 3-10)
>id2
(Contig1 sequence from bp 12-25)
>id3
(Contig2 sequence from bp 19-40)

If anyone has any suggestions or a script that can do this, any help would be greatly appreciated.

sequence python fasta • 261 views
ADD COMMENTlink modified 4 months ago by Bastien Hervé1.5k • written 4 months ago by shawnt12340
2
gravatar for Bastien Hervé
4 months ago by
Bastien Hervé1.5k
Limoges, CBRS, France
Bastien Hervé1.5k wrote:

It's possible in Biopython

1) Create a dataframe with your csv file (make your id column as index)

2) Iterate over your fasta file using SeqIO

3) For the record you get from your iteration, find the corresponding row in your dataframe (something like : df.loc[[record.id]])

4) Once you have the good row, modify the header record with the row infos

5) Substring and replace the sequence record (record.sequence)

6) Write the record in a new file

7) Step3

I let you try this by your own, if you want some help comment below :)

ADD COMMENTlink modified 4 months ago • written 4 months ago by Bastien Hervé1.5k

Thanks for the help! I wrote a script and it was not very efficient so it ran very slow, so I did some more research and found bedtools getfasta and that worked for me.

ADD REPLYlink written 4 months ago by shawnt12340
1
gravatar for genomax
4 months ago by
genomax52k
United States
genomax52k wrote:

pyfaidx by Matt Shirley.

ADD COMMENTlink written 4 months ago by genomax52k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 514 users visited in the last hour