Question

How to find corresponding nucleotide locations (i.e. 123682765-123683049) from a list of chromosomal regions (i.e. 5p14-15, 5q13-15, 5q31-32, etc)

0

Entering edit mode

3.8 years ago

cuib • 0

I'm new to sequence analysis, so I might have posted a redundant question. If so, please refer me to the correct place. From my initial search, I couldn't find answers to my question.

Is there a computational way (I prefer to use R, but if there's any more useful tool, please feel free to suggest!) to find corresponding nucleotide locations (i.e. 123682765-123683049) from a list of chromosomal regions (i.e. 5p14-15, 5q13-15, 5q31-32, etc)?

My input data would be: 5p14-15, 5q13-15, 5q31-32, etc

I'd like to get result in a dataframe format: 1st column listing nucleotide start location and 2nd column would be listing nucleotide end location.

Also, I'm doing ATACseq analysis, and if you know any beginner friendly learning materials/videos, I'd love to learn more.

Thank you.

sequence genome gene R • 779 views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 3.8 years ago by cuib • 0

score 3 · Answer 1 · 2020-08-05

3

Entering edit mode

3.8 years ago

Ram 43k

The term you're looking for is "cytoBand". That's what the 5p15 etc. notations are called.

Check out this post: How To Obtain Chromosome Locus From Coordinates

For each input such as 5p15-16, split by - then prefix the \d+[pq] part of the first sub-string to the second sub-string. You can then pick the start position corresponding to the first sub-string and the end position corresponding to the second sub-string, and those would be your start and end positions.

ADD COMMENT • link 3.8 years ago by Ram 43k

0

Entering edit mode

Hi, thank you so much for your link and the name "CytoBand"! It's really useful. I was able to find useful questions such as following.

Cytogenic Location To Genome Coordinates In R

Genomic coordinates for Cytogenetic bands with R

I started using RStudio's Terminal. But I'm unfamiliar with it. Is there a stepwise instruction somewhere?

Should I use the code below? Is this what you mean by \d+[pq]?

curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/cytoBand.txt.gz" | gunzip  -c

Or

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19  -e "select chrom, min(chromStart), max(chromEnd) from cytoBand where name like 'q36%' group by chrom;"

Should hg19 be hg38 (what we used)? Also, above command gives "mysql: command not found"

Thank you again for your help

ADD REPLY • link 3.8 years ago by cuib • 0

0

Entering edit mode

These questions are a lot more basic than the methods question you asked at the beginning. It looks like you're going to need to install mysql and learn a bit of R (and some regular expressions), and I cannot help you with that. If you're not familiar with mysql and R, please involve someone near you who can help you with that.

ADD REPLY • link 3.8 years ago by Ram 43k

0

Entering edit mode

Thank you for your reply. I'm familiar with R and regular expression in statistical context and have learned SQL and HiveQL basics, but not mysql and terminal in a genomics context. I want to learn more about this process for sure. What kind of class would you suggest for this kind of process? I personally have a limited access to people who are knowledgeable of what I want to learn. Thank you.

ADD REPLY • link 3.8 years ago by cuib • 0

0

Entering edit mode

If you know R, regex and SQL in any context, they can be applied here. It's the data that is different so you should be fine.

ADD REPLY • link 3.8 years ago by Ram 43k