get genomic coordinates plus 5kb from Gene Symbols
2
0
Entering edit mode
9.2 years ago
jfertaj ▴ 110

Hi list,

I am trying to use RMySQL to get genomic coordinates from a bunch of gene symbols and add 5kb to the TSS afterwards. I know I can do it using Tables from UCSC browser or BiomaRt but I would like to learn how to query using RMySQL or mysql,

I have followed this post but I cannot figure out how to do it with geneSymbols (HUGO names) because name in Knowngene table is different from kgXref table

Any help would it be appreciated

Thanks

genome mysql RMySQL • 2.1k views
ADD COMMENT
2
Entering edit mode
9.2 years ago
Ram 43k

Use a join with knownGene.name=kgXref.kgID

You can always use the Table Browser to check the schema of UCSC's MySQL tables to guess which identifiers might match. These tables are highly redundant to help querying, mapping and display.

ADD COMMENT
0
Entering edit mode
9.2 years ago
Chirag Nepal ★ 2.4k

If interested, another alternative way is to use one-liner:

Extract 5KB region around TSS

cat input.bed | awk 'BEGIN {OFS="\t"}  { if ($6 == "+") { print $1,$2-5000,$2+5000,$4,$5,$6 } else if ($6 == "-") { print $1,$3-5000,$3+5000,$4,$5,$6 } > Temp

If you want sequence of these: used bedtools

fastaFromBed -s -name -fi assembly.fa -bed Temp -fo Temp.fa
ADD COMMENT

Login before adding your answer.

Traffic: 2085 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6