Question: Finding coding and UTR regions for a gene list
0
gravatar for seta
3.6 years ago by
seta1.1k
Sweden
seta1.1k wrote:

Hi all,

I have already found some simple sequence repeat (SSR) using MISA, SSRlocator tools on some gene sequence of human. Now I want to find the location of these SSRs, where the repeat located on the coding or non-coding (UTR) region of gene sequences, I can do it using USCS genome browser for one gene at the time, but it's time-consuming for many genes. Could you please let me know how I can perform it for many genes? Thanks

sequencing utr coding alignment • 1.6k views
ADD COMMENTlink modified 3.6 years ago by Pierre Lindenbaum119k • written 3.6 years ago by seta1.1k

what kind of simple sequence repeats ? do you want to process the sequences by yourself or do you want to know if any database knows about any repeat (poly-X, repeat-masker ? )

ADD REPLYlink written 3.6 years ago by Pierre Lindenbaum119k

Actually, I have some simple sequence repeat (SSR) that would like to find their location on the gene sequences of interest, if repeats located on the coding or non-coding (UTR) regions? 

ADD REPLYlink written 3.6 years ago by seta1.1k

how do you check manually of the UTR contains a SSR ?

ADD REPLYlink written 3.6 years ago by Pierre Lindenbaum119k

I have already find some SSR repeat using SSRlocator, MISA tools. Now, I want to know where is located these SSRs, on the coding or UTR parts of gene sequences of interest? 

ADD REPLYlink written 3.6 years ago by seta1.1k
3
gravatar for Pierre Lindenbaum
3.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

UCSC has already computed the simple repeats.

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg38 -e 'select K.chrom,R.repClass,R.genoStart,R.genoEnd,K.name,K.txStart,K.cdsStart,K.cdsEnd,K.txEnd from rmsk as R, knownGene as K where K.chrom=R.genoName and ((R.genoStart>=K.txStart AND R.genoEnd<=K.cdsStart) OR (R.genoStart>=K.cdsEnd AND R.genoEnd<=K.txEnd)) limit 10'
+-------+---------------+-----------+---------+------------+---------+----------+--------+-------+
| chrom | repClass      | genoStart | genoEnd | name       | txStart | cdsStart | cdsEnd | txEnd |
+-------+---------------+-----------+---------+------------+---------+----------+--------+-------+
| chr1  | Simple_repeat |     29744 |   29792 | uc057aty.1 |   29553 |    29553 |  29553 | 31097 |
| chr1  | LINE          |     29901 |   30198 | uc057aty.1 |   29553 |    29553 |  29553 | 31097 |
| chr1  | DNA           |     30342 |   30532 | uc057aty.1 |   29553 |    29553 |  29553 | 31097 |
| chr1  | LTR           |     30693 |   30848 | uc057aty.1 |   29553 |    29553 |  29553 | 31097 |
| chr1  | Simple_repeat |     30854 |   30952 | uc057aty.1 |   29553 |    29553 |  29553 | 31097 |
| chr1  | DNA           |     30342 |   30532 | uc057atz.1 |   30266 |    30266 |  30266 | 31109 |
| chr1  | LTR           |     30693 |   30848 | uc057atz.1 |   30266 |    30266 |  30266 | 31109 |
| chr1  | Simple_repeat |     30854 |   30952 | uc057atz.1 |   30266 |    30266 |  30266 | 31109 |
| chr1  | LTR           |     34564 |   34921 | uc001aak.4 |   34553 |    34553 |  34553 | 36081 |
| chr1  | SINE          |     35216 |   35366 | uc001aak.4 |   34553 |    34553 |  34553 | 36081 |
+-------+---------------+-----------+---------+------------+---------+----------+--------+-------+

 

or

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg38 -e 'select K.chrom,R.name,R.chromStart,R.chromEnd,R.sequence,K.name,K.txStart,K.cdsStart,K.cdsEnd,K.txEnd from simpleRepeat as R, knownGene as K where K.chrom=R.chrom and ((R.chromStart>=K.txStart AND R.chromEnd<=K.cdsStart) OR (R.chromStart>=K.cdsEnd AND R.chromEnd<=K.txEnd)) limit 10'
+-------+------+------------+----------+-------------------------------------------------------------+------------+---------+----------+--------+--------+
| chrom | name | chromStart | chromEnd | sequence                                                    | name       | txStart | cdsStart | cdsEnd | txEnd  |
+-------+------+------------+----------+-------------------------------------------------------------+------------+---------+----------+--------+--------+
| chr1  | trf  |      30862 |    30959 | TC                                                          | uc057aty.1 |   29553 |    29553 |  29553 |  31097 |
| chr1  | trf  |      30862 |    30959 | TC                                                          | uc057atz.1 |   30266 |    30266 |  30266 |  31109 |
| chr1  | trf  |      90047 |    90430 | AACCTGCTGCTTCCTGGAGGAAGACAGTCCCTCAGTCCCTCTGTCTCTGCCAACCAGTT | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |      92209 |    92243 | TCTGCATTGGTTTGG                                             | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |      98999 |    99042 | TTTA                                                        | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |      99046 |    99116 | TTTTTTTTCTTTCTTTTTTTTTTTTTTTT                               | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |      99046 |    99116 | T                                                           | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |      99046 |    99115 | TTTTTTTTCTTTTCTTTCTTTTCTTCTT                                | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |      99047 |    99115 | TTTTTTTTTTC                                                 | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
| chr1  | trf  |     102109 |   102152 | AATAAATAAGAAAACAGAAACT                                      | uc057aub.1 |   89294 |    89294 |  89294 | 120932 |
+-------+------+------------+----------+-------------------------------------------------------------+------------+---------+----------+--------+--------+

 

ADD COMMENTlink written 3.6 years ago by Pierre Lindenbaum119k

Thanks for your nice reply, however I have some predetermined SSR repeats, like AT, CCG, etc. Could you please let me know how I can find just repeats of interest?

 

ADD REPLYlink written 3.6 years ago by seta1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1765 users visited in the last hour