Finding coding and UTR regions for a gene list
1
0
Entering edit mode
5.7 years ago
seta ★ 1.5k

Hi all,

I have already found some simple sequence repeat (SSR) using MISA, SSRlocator tools on some gene sequence of human. Now I want to find the location of these SSRs, where the repeat located on the coding or non-coding (UTR) region of gene sequences, I can do it using USCS genome browser for one gene at the time, but it's time-consuming for many genes. Could you please let me know how I can perform it for many genes? Thanks

sequencing alignment coding UTR • 1.9k views
ADD COMMENT
0
Entering edit mode

what kind of simple sequence repeats ? do you want to process the sequences by yourself or do you want to know if any database knows about any repeat (poly-X, repeat-masker ? )

ADD REPLY
0
Entering edit mode

Actually, I have some simple sequence repeat (SSR) that would like to find their location on the gene sequences of interest, if repeats located on the coding or non-coding (UTR) regions? 

ADD REPLY
0
Entering edit mode

how do you check manually of the UTR contains a SSR ?

ADD REPLY
0
Entering edit mode

I have already find some SSR repeat using SSRlocator, MISA tools. Now, I want to know where is located these SSRs, on the coding or UTR parts of gene sequences of interest? 

ADD REPLY
3
Entering edit mode
5.7 years ago

UCSC has already computed the simple repeats.

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg38 -e 'select K.chrom,R.repClass,R.genoStart,R.genoEnd,K.name,K.txStart,K.cdsStart,K.cdsEnd,K.txEnd from rmsk as R, knownGene as K where K.chrom=R.genoName and ((R.genoStart>=K.txStart AND R.genoEnd<=K.cdsStart) OR (R.genoStart>=K.cdsEnd AND R.genoEnd<=K.txEnd)) limit 10'
+-------+---------------+-----------+---------+------------+---------+----------+--------+-------+
| chrom | repClass      | genoStart | genoEnd | name       | txStart | cdsStart | cdsEnd | txEnd |
+-------+---------------+-----------+---------+------------+---------+----------+--------+-------+
| chr1  | Simple_repeat |     29744 |   29792 | uc057aty.1 |   29553 |    29553 |  29553 | 31097 |
| chr1  | LINE          |     29901 |   30198 | uc057aty.1 |   29553 |    29553 |  29553 | 31097 |
| chr1  | DNA           |     30342 |   30532 | uc057aty.1 |   29553 |    29553 |  29553 | 31097 |
| chr1  | LTR           |     30693 |   30848 | uc057aty.1 |   29553 |    29553 |  29553 | 31097 |
| chr1  | Simple_repeat |     30854 |   30952 | uc057aty.1 |   29553 |    29553 |  29553 | 31097 |
| chr1  | DNA           |     30342 |   30532 | uc057atz.1 |   30266 |    30266 |  30266 | 31109 |
| chr1  | LTR           |     30693 |   30848 | uc057atz.1 |   30266 |    30266 |  30266 | 31109 |
| chr1  | Simple_repeat |     30854 |   30952 | uc057atz.1 |   30266 |    30266 |  30266 | 31109 |
| chr1  | LTR           |     34564 |   34921 | uc001aak.4 |   34553 |    34553 |  34553 | 36081 |
| chr1  | SINE          |     35216 |   35366 | uc001aak.4 |   34553 |    34553 |  34553 | 36081 |
+-------+---------------+-----------+---------+------------+---------+----------+--------+-------+

 

or

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg38 -e 'select K.chrom,R.name,R.chromStart,R.chromEnd,R.sequence,K.name,K.txStart,K.cdsStart,K.cdsEnd,K.txEnd from simpleRepeat as R, knownGene as K where K.chrom=R.chrom and ((R.chromStart>=K.txStart AND R.chromEnd<=K.cdsStart) OR (R.chromStart>=K.cdsEnd AND R.chromEnd<=K.txEnd)) limit 10'
+-------+------+------------+----------+-------------------------------------------------------------+------------+---------+----------+--------+--------+
| chrom | name | chromStart | chromEnd | sequence                                                    | name       | txStart | cdsStart | cdsEnd | txEnd  |
+-------+------+------------+----------+-------------------------------------------------------------+------------+---------+----------+--------+--------+
| chr1  | trf  |      30862 |    30959 | TC                                                          | uc057aty.1 |   29553 |    29553 |  29553 |  31097 |
| chr1  | trf  |      30862 |    30959 | TC                                                          | uc057atz.1 |   30266 |    30266 |  30266 |  31109 |
| chr1  | trf  |      90047 |    90430 | AACCTGCTGCTTCCTGGAGGAAGACAGTCCCTCAGTCCCTCTGTCTCTGCCAACCAGTT | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |      92209 |    92243 | TCTGCATTGGTTTGG                                             | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |      98999 |    99042 | TTTA                                                        | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |      99046 |    99116 | TTTTTTTTCTTTCTTTTTTTTTTTTTTTT                               | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |      99046 |    99116 | T                                                           | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |      99046 |    99115 | TTTTTTTTCTTTTCTTTCTTTTCTTCTT                                | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |      99047 |    99115 | TTTTTTTTTTC                                                 | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |     102109 |   102152 | AATAAATAAGAAAACAGAAACT                                      | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
+-------+------+------------+----------+-------------------------------------------------------------+------------+---------+----------+--------+--------+

 

ADD COMMENT
0
Entering edit mode

Thanks for your nice reply, however I have some predetermined SSR repeats, like AT, CCG, etc. Could you please let me know how I can find just repeats of interest?

 

ADD REPLY

Login before adding your answer.

Traffic: 2053 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6