Question: Hg19 Gc Model
0
gravatar for Astoulpho
4.8 years ago by
Astoulpho10
Brazil
Astoulpho10 wrote:

Does anyone know where to find (or how to create) a gc-content file that serves as input to PennCNV? The software has a gc model, but it is for hg18 and its format is different fom gc content file found in UCSC genomes.

gc cnv hg19 • 2.8k views
ADD COMMENTlink modified 4.5 years ago by Bingley10 • written 4.8 years ago by Astoulpho10
3
gravatar for Pierre Lindenbaum
4.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum113k wrote:

The following bash script should do the job:

IFS="
"

echo -e "Name\tChr\tPosition\tGC"
curl -s "http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/snp138.txt.gz" | gunzip -c |\
cut -f 2,3,5|\
while read L
do
    C=`echo $L | cut -f 1`
    P=`echo $L | cut -f 2`
    R=`echo $L | cut -f 3`
    S=`echo $L |awk '{beg=int($2)-500; if(beg<1) beg=1; printf("%s:%d,%d",$1,beg,int($2)+500);}'`

    echo -e -n "${R}\t${C}\t${P}\t" | sed 's/chr//'
    xmllint --xpath "/DASDNA/SEQUENCE/DNA/text()" "http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=$S" |\
    tr -d "\n" | awk '{L1=1.0*length($0);c=$0;gsub(/[^GCgc]/,"",c);L2=1.0*length(c); printf("%f\n",100.0*(L2/L1));}'
done

explain: IFS is the internal field sepator (using the newline). Download and g-unzip the dbSNP138: cut the chrom,position,name. extends the position +/ 500pb , use the UCSC das server to fetch the sequence. remove everything that is not GC to get a ratio length(all)/length(GC_only.)

Note: I'm away from the lab, I've used a DAS server to get the GC% but you'd better use a local resource like samtools faidx

$ bash  shell.sh
Name    Chr    Position    GC
rs144773400    1    10144    36.263736
rs201752861    1    10176    38.961039
rs201694901    1    10179    39.260739
rs143255646    1    10228    43.456543
rs200462216    1    10228    43.456543
rs200279319    1    10230    43.656344
rs145599635    1    10233    43.956044
rs148908337    1    10247    44.955045
rs199706086    1    10249    45.154845
rs140194106    1    10254    45.654346
ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Pierre Lindenbaum113k
1

This is very nice code! :-)

ADD REPLYlink written 4.8 years ago by PoGibas4.7k

Hi Pierre,

I got the following error when I was trying to run the script you post. Any idea? thanks.

http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:38274,39274:1: parser error : Document is empty

^
http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:38274,39274:1: parser error : Start tag expected, '<' not found

^

B.

ADD REPLYlink written 4.5 years ago by Bingley10

Hi Pierre. Is it possible to post the file ? I was not able to work with your code. Thanks

ADD REPLYlink written 3.2 years ago by Myosotis197910
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1582 users visited in the last hour