Question: Computing chunk of fasta file before moving to next chunks using python
0
gravatar for ToastedGoat
18 months ago by
ToastedGoat10
ToastedGoat10 wrote:

I have a fasta file with 100 random permutations of a genome. It is then broken up into ORFs. Here is a sample:

>Shuffled1_1 [5 - 79] Based on NC_000908.2
>MNGLISMLIYFDEYENEIFVEKKDL
>Shuffled1_2 [36 - 101] Based on NC_000908.2
>MMSMKMKFSLKKKICNYIPKLQ
>Shuffled1_3 [19 - 111] Based on NC_000908.2
>MNVNLLWWVWKWNFRWKKRFVTTYQSSNKLK
>Shuffled1_4 [105 - 134] Based on NC_000908.2
>MKVERQISTL
>Shuffled1_5 [148 - 177] Based on NC_000908.2
>MINNSSSPAL
>Shuffled1_6 [174 - 242] Based on NC_000908.2
>MINKFPELLLTQLSYVRYYLFAI

There are over 3 million records. I need to calculate the 95th percentile by averaging the estimates of each shuffled permutation. I am having trouble figuring out how to get the 95th percentile of shuffled1 before moving onto shuffled2 and so on. Any help would be appreciated. I was able to pull out just the first shuffled permutation using awk and get the percentile using this code:

from Bio import SeqIO
import numpy as np

seqLen=[]

fasta_sequence = SeqIO.parse(open("firstperm.fasta"),"fasta")

for fasta in fasta_sequence:
    name,sequence = fasta.id, str(fasta.seq)
    seqLen.append(len(fasta))

a = np.array(seqLen)
p = np.percentile(a,95)
print p

I am just trying to figure out a way to duplicate this over the entire fasta file for all 100 shuffled permutations. Basically I am looking for 100 numbers in a list for each 95th percentile.

python fasta • 534 views
ADD COMMENTlink modified 18 months ago by george.ry1.1k • written 18 months ago by ToastedGoat10
1
gravatar for george.ry
18 months ago by
george.ry1.1k
United Kingdom
george.ry1.1k wrote:
from collections import defaultdict as dd
from Bio import SeqIO
import numpy as np

shuffles = dd(list)

for record in SeqIO.parse(open('firstperm.fasta'), 'fasta'):
    id = record.id.split('_')[0]
    shuffles[id].append(len(record))

for k, v in shuffles.items():
    print(k, np.percentile(v, 95))

Something like that? (Not tested ... on the wrong machine)

ADD COMMENTlink modified 18 months ago • written 18 months ago by george.ry1.1k
1

Thanks this works! Appreciate it!

ADD REPLYlink written 18 months ago by ToastedGoat10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 689 users visited in the last hour