I have a list of say ~100 genes and want to retrieve the 50 nucleotides that are just downstream of the ends of these genes. I do not have the chromosomal locations of these genes readily available. How do I automate the process of finding the position where the gene ends and get the sequence downstream of it?
I have no experience using bioinformatics tools and would appreciate any pointers.
For example, take the TRBV10-1 gene. I was able to locate it using BLAST and then can extend the sequence region displayed by changing the number in the link below. But I'm sure there's a better way.
Note that there are several options for "flank"; select the one most appropriate for you. Note also that you can search using other identifiers; I assumed from your example that HGNC symbols work best for you.
If some of the genes in the list fail to be found (for unknown reasons), is it possible to know which ones failed? The results seem to be simply a list of sequences, shorter than the list of queries, with no reference to the corresponding query. I would like to match each result to the query. Thanks again.
There's an option "Header" at step 7 which should allow you to include the HGNC symbol as part of the fasta sequence header. Another option would be to use different attributes (not sequences) which should return a table with blank entries for queries that did not retrieve data.
Thanks! You saved me a lot of work!
If some of the genes in the list fail to be found (for unknown reasons), is it possible to know which ones failed? The results seem to be simply a list of sequences, shorter than the list of queries, with no reference to the corresponding query. I would like to match each result to the query. Thanks again.
There's an option "Header" at step 7 which should allow you to include the HGNC symbol as part of the fasta sequence header. Another option would be to use different attributes (not sequences) which should return a table with blank entries for queries that did not retrieve data.