About calculating the GC content in a sliding window
0
0
Entering edit mode
3.7 years ago
jon.brate ▴ 290

I want to plot the GC content along a genome contig. And since it's not possible to estimate percentage or fractions of a single position I need to use some sort of window along the contig to estimate. I found this page which uses bedtools makewindows and bedtools nuc to estimate the GC-content in 1000 bp, non-overlapping windows.

In order to get a gc-content number for every nucleotide on the contig I added the option -s 1 in bedtools makewindows to shift the windows one nucleotide each time. And then I calculated the gc content of each window using bedtools nuc. I was thinking that the gc content of the first window could represent the gc content of the first nucleotide, and so on. But this means that the nucleotide which is the first in each window gets the gc content of the entire window?

Any thoughts on this? Or suggestions on how to better visualize the gc content along a contig?

Thanks, Jon

gccontent bedtools • 4.8k views
ADD COMMENT
0
Entering edit mode

I don't quite understand this. If you need GC content for every base why use a sliding window?

ADD REPLY
0
Entering edit mode

Is there an alternative to a sliding window? I need to use some kind of a collection of nucleotides to calculate frequencies? If you know of any better methods to calculate GC content for every base I would be very happy.

ADD REPLY
2
Entering edit mode

I was thinking that the gc content of the first window could represent the gc content of the first nucleotide, and so on.

GC content would be an average across the window size you are choosing. I assume the -s option is step-size for bedtools makewindow. If you were selecting a 100 bp window then you get the GC% across initial 100 bp window. You then slide the window over by 1 bp and get GC% for 2-101 bp and so on.

I want to plot the GC content along a genome contig.

You can use cpgplot from EMBOSS for this. Download EMBOSS for more flexibility.

ADD REPLY
0
Entering edit mode

Thanks, I'll check it out.

GC content would be an average across the window size you are choosing. I assume the -s option is step-size for bedtools makewindow. If you were selecting a 100 bp window then you get the GC% across initial 100 bp window. You then slide the window over by 1 bp and get GC% for 2-101 bp and so on.

Yes, this is how I also see it. But the calculated GC content for the first nucleotide on the contig would be the average across the first 100 nucleotides. But I think that actually nucleotide nr. 50 (the middle in the first window) should rather have the GC-content for the first window. So with this procedure, each nucleotide gets the GC-content of mostly the 99 succeeding nucleotides. And I felt that this was not accurate enough. But perhaps I am misunderstanding something.

ADD REPLY
0
Entering edit mode

Hi GenoMax, hope everything is ok with you bro. Do you know if I can use bedtools makewindow approach to multifasta fileas? I take a look at documentation but didn't find any info? Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2127 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6