Regions of extreme GC content
1
1
Entering edit mode
6.1 years ago
wanabi ▴ 60

Hello for CNV analysis QC, I am looking for a reliable bed file of regions with extreme (>90%, <10%) GC content.

Any idea were I can find such file? I tried UCSC but it gives it in bins of 5bp, which is not very convenient.

 

Thanks

gc content • 2.0k views
ADD COMMENT
5
Entering edit mode
6.1 years ago

It's not too difficult to generate a bed file of GC content along the genome. You just need the reference fasta file and a genome file giving the length of the chromosomes. Then with bedtools, first create sliding windows along the genome and for each window calculate the %GC, then use e.g. awk to get rows where %CG is above/below a threshold, something like:

bedtools makewindows -g hg19.genome -w 1000 \
| nucBed -fi hg19.genome.fasta -bed \
| awk '$5 > 0.9 || $5 < 0.1'

See 'nucBed -h' for the output format, I think %GC is going to be in column 5, not sure though.

ADD COMMENT
0
Entering edit mode

what is the bed file in this command?

ADD REPLY
0
Entering edit mode

Did you notice the | in the command? Output of bedtools makewindows is directly being piped into nucBed (it is in the bed format).

ADD REPLY
0
Entering edit mode

Actually I have an error:

"Less than the req'd two fields were encountered in the genome file (hg19.genome) at line 1."

and my hg19.genome is:

chr1    249250621
chr2    243199373
chr3    198022430
chr4    191154276
chr5    180915260
chr6    171115067
chr7    159138663
chrX    155270560
chr8    146364022
chr9    141213431
.
.
.
ADD REPLY
0
Entering edit mode

I can fix my error... thanks for your help.

ADD REPLY
0
Entering edit mode

What turned out to be the problem?

ADD REPLY
0
Entering edit mode

I changed "chr1" to "1" and .... . then my error was fixed.

ADD REPLY
0
Entering edit mode

Do you know how can I compute mappability? I want to fix mappability bias and I need mappability like gc content.

ADD REPLY

Login before adding your answer.

Traffic: 2896 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6