From the concatenated fasta file, how to find individual range of locations in each protein sequence
0
0
Entering edit mode
7.7 years ago
User 6777 ▴ 20

Hi all,

I have a protein fasta file (protein.txt) like:

>a
mnspq
>b
rstuvw
>c
mnqa

Note that the length of a, b and c proteins are 5,6 and 4 respectively (total length = 15)

now I have extracted some ranges (calculation is based on total length) and save it (file1.txt) as:

2-3
4-10
11-14

The length of each protein (within the total length) as seen in protein file is saved in another file (file2.txt) as:

a  1-5
b  6-11
c  12-15

Now from file1 values, I want to modify the file2 values and try to calculate individual range for each protein sequence, For the above input, the output will be:

a   2-3,4-5
b   1-5, 6
c   2-5

In other words, if I first concatenate my all sequences and derermine some ranges from the concatenated file, how can I find individual range of locations in each protein sequence

Thanks for your consideration.

fasta perl python • 1.8k views
ADD COMMENT
0
Entering edit mode

Well, just write a script.

ADD REPLY
0
Entering edit mode

As long as you have unique headers in your multi-fasta file samtools faidx region should do the extraction part. See this: Extract User Defined Region From An Fasta File @Matt Shirley also has a python based pyfaidx solution.

I am not exactly certain what you are trying to do in the subsequent steps.

Edit: Re-reading your original post I am not sure this is what you need. But I will leave this here for now to see if it helps.

ADD REPLY

Login before adding your answer.

Traffic: 1818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6