Number of "AT" repeats in sequence greater than 500bp
0
0
Entering edit mode
6.4 years ago
marija ▴ 70

Hi, please I need your help.I have a sequence with length 1 000 000 bp. I want to create a function, which return the number of "AT" repeats (% A+T must be greater than 30%) and length must be greater than 500bp. I just don't know how to write the statement that the length must be greater than 500bp. Any help? Thank you

R • 1.3k views
ADD COMMENT
1
Entering edit mode

Does letterFrequencyInSlidingView from the Biostrings Bioconductor package do what you need to get a summary of the data?

ADD REPLY
0
Entering edit mode

I just know how to define that % A+T must be greater than 30%:

at <- function(x){
   alfreq <- alphabetFrequency(x, as.prob=TRUE)
    sum(alfreq[,c("A", "T")])
}

x > 0.30
ADD REPLY
0
Entering edit mode

Please edit the original post/question to add new information.

ADD REPLY
0
Entering edit mode

If you are parsing a single sequence for >500bp stretches with AT content > 30 % (i.e GC content <70), then what about window size and window overlap? or are you looking for set of sequences?

ADD REPLY
0
Entering edit mode

I want the number of sequences that are greater than 500bp and have AT content > 30%. E.g in chr2:1,000,000-2,000,000 are 456 sequences with AT > 30% and length > 500bp (just example).

ADD REPLY

Login before adding your answer.

Traffic: 2591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6