Question: About calculating the GC content in a sliding window
0
gravatar for jon.brate
3 months ago by
jon.brate250
Norway
jon.brate250 wrote:

I want to plot the GC content along a genome contig. And since it's not possible to estimate percentage or fractions of a single position I need to use some sort of window along the contig to estimate. I found this page which uses bedtools makewindows and bedtools nuc to estimate the GC-content in 1000 bp, non-overlapping windows.

In order to get a gc-content number for every nucleotide on the contig I added the option -s 1 in bedtools makewindows to shift the windows one nucleotide each time. And then I calculated the gc content of each window using bedtools nuc. I was thinking that the gc content of the first window could represent the gc content of the first nucleotide, and so on. But this means that the nucleotide which is the first in each window gets the gc content of the entire window?

Any thoughts on this? Or suggestions on how to better visualize the gc content along a contig?

Thanks, Jon

gccontent bedtools • 311 views
ADD COMMENTlink written 3 months ago by jon.brate250

I don't quite understand this. If you need GC content for every base why use a sliding window?

ADD REPLYlink written 3 months ago by GenoMax92k

Is there an alternative to a sliding window? I need to use some kind of a collection of nucleotides to calculate frequencies? If you know of any better methods to calculate GC content for every base I would be very happy.

ADD REPLYlink written 3 months ago by jon.brate250

I was thinking that the gc content of the first window could represent the gc content of the first nucleotide, and so on.

GC content would be an average across the window size you are choosing. I assume the -s option is step-size for bedtools makewindow. If you were selecting a 100 bp window then you get the GC% across initial 100 bp window. You then slide the window over by 1 bp and get GC% for 2-101 bp and so on.

I want to plot the GC content along a genome contig.

You can use cpgplot from EMBOSS for this. Download EMBOSS for more flexibility.

ADD REPLYlink modified 3 months ago • written 3 months ago by GenoMax92k

Thanks, I'll check it out.

GC content would be an average across the window size you are choosing. I assume the -s option is step-size for bedtools makewindow. If you were selecting a 100 bp window then you get the GC% across initial 100 bp window. You then slide the window over by 1 bp and get GC% for 2-101 bp and so on.

Yes, this is how I also see it. But the calculated GC content for the first nucleotide on the contig would be the average across the first 100 nucleotides. But I think that actually nucleotide nr. 50 (the middle in the first window) should rather have the GC-content for the first window. So with this procedure, each nucleotide gets the GC-content of mostly the 99 succeeding nucleotides. And I felt that this was not accurate enough. But perhaps I am misunderstanding something.

ADD REPLYlink written 3 months ago by jon.brate250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2053 users visited in the last hour