About calculating the GC content in a sliding window
0
0
Entering edit mode
11 months ago
jon.brate ▴ 250

I want to plot the GC content along a genome contig. And since it's not possible to estimate percentage or fractions of a single position I need to use some sort of window along the contig to estimate. I found this page which uses bedtools makewindows and bedtools nuc to estimate the GC-content in 1000 bp, non-overlapping windows.

In order to get a gc-content number for every nucleotide on the contig I added the option -s 1 in bedtools makewindows to shift the windows one nucleotide each time. And then I calculated the gc content of each window using bedtools nuc. I was thinking that the gc content of the first window could represent the gc content of the first nucleotide, and so on. But this means that the nucleotide which is the first in each window gets the gc content of the entire window?

Any thoughts on this? Or suggestions on how to better visualize the gc content along a contig?

Thanks, Jon

gccontent bedtools • 1.1k views
0
Entering edit mode

I don't quite understand this. If you need GC content for every base why use a sliding window?

0
Entering edit mode

Is there an alternative to a sliding window? I need to use some kind of a collection of nucleotides to calculate frequencies? If you know of any better methods to calculate GC content for every base I would be very happy.

1
Entering edit mode

I was thinking that the gc content of the first window could represent the gc content of the first nucleotide, and so on.

GC content would be an average across the window size you are choosing. I assume the -s option is step-size for bedtools makewindow. If you were selecting a 100 bp window then you get the GC% across initial 100 bp window. You then slide the window over by 1 bp and get GC% for 2-101 bp and so on.

I want to plot the GC content along a genome contig.

You can use cpgplot from EMBOSS for this. Download EMBOSS for more flexibility.

0
Entering edit mode

Thanks, I'll check it out.

GC content would be an average across the window size you are choosing. I assume the -s option is step-size for bedtools makewindow. If you were selecting a 100 bp window then you get the GC% across initial 100 bp window. You then slide the window over by 1 bp and get GC% for 2-101 bp and so on.

Yes, this is how I also see it. But the calculated GC content for the first nucleotide on the contig would be the average across the first 100 nucleotides. But I think that actually nucleotide nr. 50 (the middle in the first window) should rather have the GC-content for the first window. So with this procedure, each nucleotide gets the GC-content of mostly the 99 succeeding nucleotides. And I felt that this was not accurate enough. But perhaps I am misunderstanding something.