SNPs in UTR
1
0
Entering edit mode
6.6 years ago
DanielC ▴ 170

Dear All,

I am looking for a way to find the SNPs (rsID like "rs559632360") in the 3' and 5' UTRs of mRNA of BRCA1 and BRCA2. I know how to get the 3' and 5' UTRs using XSLT. Now am looking to get the SNPs and their positions in the extracted 3' and 5' UTRs. I would really appreciate if you could share the solution with me.

Thank you so much! DK

SNP • 1.8k views
ADD COMMENT
1
Entering edit mode
6.6 years ago

using mysql ucsc:

$ mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -D hg19  -e 'select chrom,chromStart,chromEnd,name,func from snp150 where chrom="chr17" and chromStart>41196310 and chromEnd <= 41277500 and (find_in_set("untranslated-5",func)!=0 or find_in_set("untranslated-3",func)!=0) ' 
+-------+------------+----------+-------------+----------------------+
| chrom | chromStart | chromEnd | name        | func                 |
+-------+------------+----------+-------------+----------------------+
| chr17 |   41196362 | 41196363 | rs8176320   | ncRNA,untranslated-3 |
| chr17 |   41196364 | 41196365 | rs966512664 | ncRNA,untranslated-3 |
| chr17 |   41196367 | 41196368 | rs184237074 | ncRNA,untranslated-3 |
| chr17 |   41196368 | 41196369 | rs765804641 | ncRNA,untranslated-3 |
| chr17 |   41196371 | 41196372 | rs189382442 | ncRNA,untranslated-3 |
| chr17 |   41196396 | 41196397 | rs868485085 | ncRNA,untranslated-3 |
| chr17 |   41196402 | 41196403 | rs182218567 | ncRNA,untranslated-3 |
| chr17 |   41196407 | 41196408 | rs12516     | ncRNA,untranslated-3 |
| chr17 |   41196407 | 41196407 | rs755411080 | ncRNA,untranslated-3 |
| chr17 |   41196408 | 41196409 | rs548275991 | ncRNA,untranslated-3 |
ADD COMMENT
0
Entering edit mode

Thanks much Pierre! I understand the parameters used in the command line like, snp150 is the dbsnpbuild database number, the chromosome number is 17, and the chromosome start and end. In my case the chromosome start and end is: (in the xml file output of BRCA1)

chromosome="17" start="26935980" end="81742541"

When I ran the command, it gave me all the rsids like this: (total 144107 rsids)

chrom   chromStart      chromEnd        name    func
chr17   26935989        26935990        rs1002304941    intron,untranslated-3
chr17   26936027        26936028        rs1027461134    intron,untranslated-3
chr17   26936037        26936038        rs143374718     intron,untranslated-3
chr17   26936044        26936048        rs952717363     intron,untranslated-3
chr17   26936049        26936050        rs372539326     intron,untranslated-3
chr17   26936050        26936051        rs749382342     intron,untranslated-3
chr17   26936050        26936064        rs1004133390    intron,untranslated-3
chr17   26936095        26936097        rs965637526     intron,untranslated-3
chr17   26936096        26936097        rs550417993     intron,untranslated-3
chr17   26936100        26936101        rs147329625     intron,untranslated-3
chr17   26936106        26936107        rs186832245     intron,untranslated-3
chr17   26936118        26936119        rs977388897     intron,untranslated-3
chr17   26936132        26936133        rs543307824     intron,untranslated-3

Please help me understand it better by answering a few queries related to it:

a) Among these rsids, for the 3' and 5' UTRs, if am right, I should extract only the "ncRNA,untranslated-3" & "ncRNA,untranslated-5"? Because there are many other entries such as, "intron,near-gene-5,untranslated-5", "untranslated-3"..etc

Let's say I have found using XSLT, 3' and 5' UTRs; for refseq geneid "NM_007299.3" of BRCA1 (shown below), and this refseq geneid has rsids, one such rsid is "rs863224421".

b) When I searched for "rs863224421" in the list of "144107" rsids obtained from the mysql ucsc command, I do not find it in the list? If am right, this rsid "rs863224421" should be present in the list? Please let me know if am missing some information.

c) From the XSLT result of BRCA1 refseq geneids like below, could you please tell me, how to find what rsid belong to these 5' and 3' UTR, and at what position? Fro instance this information:

refseq-gene-id                                         mutated-allele                                position
NM_007299.3|-195| 5' UTR                      (let's say) c to t                              let's say (200)



>NM_007299.3|-195|5' UTR
cttagcggtagccccttggtttccgtggcaacggaaaagcgcgggaattacagataaatt
aaaactgcgactgcgcggcgtgagctcgctgagacttcctggacgggggacaggctgtgg
ggtttctcagataactgggcccctgcgctcaggaggccttcaccctctgctctggttcat
tggaacagaaagaa
>NM_007299.3|2294-|3' UTR
ggcacctgtggtgacccgagagtgggtgttggacagtgtagcactctaccagtgccagga
gctggacacctacctgataccccagatcccccacagccactactgactgcagccagccac
aggtacagagccacaggaccccaagaatgagcttacaaagtggcctttccaggccctggg
agctcctctcactcttcagtccttctactgtcctggctactaaatattttatgtacatca
gcctgaaaaggacttctggctatgcaagggtcccttaaagattttctgcttgaagtctcc

Thanks much! :-) DK

ADD REPLY

Login before adding your answer.

Traffic: 1925 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6