Hi there, suppose we want to get reference and alternate alleles for several regions of the genome. Is there a way to do this from UCSC Table Browser or Galaxy?
We're looking to get a format like this
chr1 100000 G G
chr1 100001 G G
...
chr1 100007 C C
chr1 100008 A A
chr1 100009 A T rs123
In other words every chromosome, base pair, reference, and, if applicable, alternate alleles and SNPs in columns. Strand orientation (if not already positive) would be great. Any help is appreciated? 5000.
Right. I know the defined regions box portion, but say I wanted this for all base pairs. So 10001 10002 10003. Realizing that only one in a hundred or so will be SNPs. Thanks both. Let me know if you have a solution for this.
so you want a list of all basepairs in a region, whether there is a dbSNP entry for that position or not, but have the dbSNP info too for the positions that do have dbSNP data?
off the top of my head, you could get the data above as I suggested, then the same region with a sequence track, export them both to Galaxy and manipulate the columns to the way you want them and then "Join" the two datasets using "all records of both datasets" or from the sequence one? Haven't tested it yet, but might be what you are looking for
off the top of my head, you could get the data above as I suggested, then the same region with a sequence track, export them both to Galaxy and manipulate the columns to the way you want them and then "Join" the two datasets using "all records of both datasets" or from the sequence one? Haven't tested it yet, but might be what you are looking for. Someone might have a SQL query or other coding solution if it's more than that.