Question: Data randomization for apply some statistics
0
gravatar for psschlogl
11 months ago by
psschlogl30
psschlogl30 wrote:

Hi there

I have a quick question. I am dealing with some gigabits of fasta files and I am trying to develop a statistical analysis of protein composition (count aacs, kmers and using some probabilistic models) and I need some random data to use.

I have the done the code to randomization and my question is I need to randomize all the data? The randomized data is based int the 'original fasta files'. It kind of reading the data and applying the randomization in each of the sequences keeping the length and the aac composition of each sequence.

For ex:

>seq 1
iterable

to:

>random_seq1
rtaieelb

I am trying to Identifies and count and doing some statistics about:

-string composition -short sustrings of k length -estimation of the distribution of characters and substrings on the data set
-distribution of k length substrings sharing the same composition is random or not
-Identification and analysis of outliers(over represented and under represented) substrings
-search databases looking for presence of these substrings in structural and functional portion of sequences

Thank you for your time and attention!

Paulo

proteins sequence fasta • 220 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by psschlogl30
1

I probably need to know better what statistical test are you applying here, but essentially you need to randomize all sequences several times IMHO

ADD REPLYlink written 11 months ago by JC12k

Hi JC I thinking in Bonferroni correction, two-sided Fisher’s exact test for kmers, at least for the moment, but I am checking other possibilities.

Thanks

ADD REPLYlink written 11 months ago by psschlogl30
1

What you're trying to do is unclear. What do you need random data for?

ADD REPLYlink written 11 months ago by Jean-Karim Heriche24k

Comments were add to the initial question

ADD REPLYlink modified 11 months ago • written 11 months ago by psschlogl30

None? 8( >>>>>>>>>>>>>>>>>>>>

ADD REPLYlink written 11 months ago by psschlogl30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1083 users visited in the last hour
_