Question: Number of "AT" repeats in sequence greater than 500bp
0
gravatar for marija
2.3 years ago by
marija40
Croatia
marija40 wrote:

Hi, please I need your help.I have a sequence with length 1 000 000 bp. I want to create a function, which return the number of "AT" repeats (% A+T must be greater than 30%) and length must be greater than 500bp. I just don't know how to write the statement that the length must be greater than 500bp. Any help? Thank you

R • 570 views
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by marija40
1

Does letterFrequencyInSlidingView from the Biostrings Bioconductor package do what you need to get a summary of the data?

ADD REPLYlink written 2.3 years ago by Sean Davis26k

I just know how to define that % A+T must be greater than 30%:

at <- function(x){
   alfreq <- alphabetFrequency(x, as.prob=TRUE)
    sum(alfreq[,c("A", "T")])
}

x > 0.30
ADD REPLYlink written 2.3 years ago by marija40

Please edit the original post/question to add new information.

ADD REPLYlink written 2.3 years ago by genomax80k

If you are parsing a single sequence for >500bp stretches with AT content > 30 % (i.e GC content <70), then what about window size and window overlap? or are you looking for set of sequences?

ADD REPLYlink written 2.3 years ago by cpad011212k

I want the number of sequences that are greater than 500bp and have AT content > 30%. E.g in chr2:1,000,000-2,000,000 are 456 sequences with AT > 30% and length > 500bp (just example).

ADD REPLYlink written 2.3 years ago by marija40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1514 users visited in the last hour