Question: How to find corresponding nucleotide locations (i.e. 123682765-123683049) from a list of chromosomal regions (i.e. 5p14-15, 5q13-15, 5q31-32, etc)
0
gravatar for cuib
7 weeks ago by
cuib0
cuib0 wrote:

I'm new to sequence analysis, so I might have posted a redundant question. If so, please refer me to the correct place. From my initial search, I couldn't find answers to my question.

Is there a computational way (I prefer to use R, but if there's any more useful tool, please feel free to suggest!) to find corresponding nucleotide locations (i.e. 123682765-123683049) from a list of chromosomal regions (i.e. 5p14-15, 5q13-15, 5q31-32, etc)?

My input data would be: 5p14-15, 5q13-15, 5q31-32, etc

I'd like to get result in a dataframe format: 1st column listing nucleotide start location and 2nd column would be listing nucleotide end location.

Also, I'm doing ATACseq analysis, and if you know any beginner friendly learning materials/videos, I'd love to learn more.

Thank you.

genome sequence tutorial R gene • 181 views
ADD COMMENTlink modified 7 weeks ago by RamRS30k • written 7 weeks ago by cuib0
3
gravatar for RamRS
7 weeks ago by
RamRS30k
Baylor College of Medicine, Houston, TX
RamRS30k wrote:

The term you're looking for is "cytoBand". That's what the 5p15 etc. notations are called.

Check out this post: How To Obtain Chromosome Locus From Coordinates

For each input such as 5p15-16, split by - then prefix the \d+[pq] part of the first sub-string to the second sub-string. You can then pick the start position corresponding to the first sub-string and the end position corresponding to the second sub-string, and those would be your start and end positions.

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by RamRS30k

Hi, thank you so much for your link and the name "CytoBand"! It's really useful. I was able to find useful questions such as following.

Cytogenic Location To Genome Coordinates In R

Genomic coordinates for Cytogenetic bands with R

I started using RStudio's Terminal. But I'm unfamiliar with it. Is there a stepwise instruction somewhere?

Should I use the code below? Is this what you mean by \d+[pq]?

curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/cytoBand.txt.gz" | gunzip  -c

Or

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19  -e "select chrom, min(chromStart), max(chromEnd) from cytoBand where name like 'q36%' group by chrom;"

Should hg19 be hg38 (what we used)? Also, above command gives "mysql: command not found"

Thank you again for your help

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by cuib0

These questions are a lot more basic than the methods question you asked at the beginning. It looks like you're going to need to install mysql and learn a bit of R (and some regular expressions), and I cannot help you with that. If you're not familiar with mysql and R, please involve someone near you who can help you with that.

ADD REPLYlink written 6 weeks ago by RamRS30k

Thank you for your reply. I'm familiar with R and regular expression in statistical context and have learned SQL and HiveQL basics, but not mysql and terminal in a genomics context. I want to learn more about this process for sure. What kind of class would you suggest for this kind of process? I personally have a limited access to people who are knowledgeable of what I want to learn. Thank you.

ADD REPLYlink written 6 weeks ago by cuib0

If you know R, regex and SQL in any context, they can be applied here. It's the data that is different so you should be fine.

ADD REPLYlink written 6 weeks ago by RamRS30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 858 users visited in the last hour