Question: Number of "AT" repeats in sequence greater than 500bp
0
gravatar for marija
17 months ago by
marija30
Croatia
marija30 wrote:

Hi, please I need your help.I have a sequence with length 1 000 000 bp. I want to create a function, which return the number of "AT" repeats (% A+T must be greater than 30%) and length must be greater than 500bp. I just don't know how to write the statement that the length must be greater than 500bp. Any help? Thank you

R • 433 views
ADD COMMENTlink modified 17 months ago • written 17 months ago by marija30
1

Does letterFrequencyInSlidingView from the Biostrings Bioconductor package do what you need to get a summary of the data?

ADD REPLYlink written 17 months ago by Sean Davis25k

I just know how to define that % A+T must be greater than 30%:

at <- function(x){
   alfreq <- alphabetFrequency(x, as.prob=TRUE)
    sum(alfreq[,c("A", "T")])
}

x > 0.30
ADD REPLYlink written 17 months ago by marija30

Please edit the original post/question to add new information.

ADD REPLYlink written 17 months ago by genomax65k

If you are parsing a single sequence for >500bp stretches with AT content > 30 % (i.e GC content <70), then what about window size and window overlap? or are you looking for set of sequences?

ADD REPLYlink written 17 months ago by cpad011211k

I want the number of sequences that are greater than 500bp and have AT content > 30%. E.g in chr2:1,000,000-2,000,000 are 456 sequences with AT > 30% and length > 500bp (just example).

ADD REPLYlink written 17 months ago by marija30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1060 users visited in the last hour