Question: R code to match the positions for repetitive elements in rmsk file for mouse mm10 with mouse genome
0
gravatar for M K
4.2 years ago by
M K460
United States
M K460 wrote:

I  am looking for R code that match the positions for repetitive elements in rmsk file for mouse mm10 with  mouse genome. I downloaded the repeat masker file for mouse mm10 form UCSC website.

rna-seq next-gen R • 2.0k views
ADD COMMENTlink written 4.2 years ago by M K460

Do you have to do this with R? I ask because I think it would be easier to download the GFF or a BED file that has the positions, and from there it would be easy to extract the regions.

ADD REPLYlink written 4.2 years ago by SES8.2k

I prefer to do that with R . Also how to use the GFF file to extract these regions.

ADD REPLYlink written 4.2 years ago by M K460

What exactly do you mean by "match positions"? The rmsk file is just a text file, so you can read it into R easily enough after fixing it with awk (just to standardize the lines, since the rmsk file is otherwise poorly formatted for machine processing). GenomicRanges will likely make whatever else you need convenient enough.

ADD REPLYlink written 4.2 years ago by Devon Ryan89k

I already read the rmsk file in R. What I want to do is knowing the positions of highly repClass like  L1 and Alu also highly repFamily like LINE and SINE in the mouse genome using the coordinates in the rmsk file and match them with positions in genome. Also what do you mean by using awk to standardize lines on rmsk file.

ADD REPLYlink written 4.2 years ago by M K460

If you already read the file in then you can ignore the mentions of awk. L1 repeats are labeled "LINE/L1" and Alu are "SINE/Alu", so just subset the dataframe accordingly.

ADD REPLYlink written 4.2 years ago by Devon Ryan89k

So how can I use GenomicRanges function in R to do that.

ADD REPLYlink written 4.2 years ago by M K460

If that's all you want to do then you don't need GenomicRanges at all.

ADD REPLYlink written 4.2 years ago by Devon Ryan89k

I mean how can use it to match those elements positions with the gene positions in the mouse genome (mm10) because I want to do some statistical analysis using the locations of them.

ADD REPLYlink written 4.2 years ago by M K460
1

See help(findOverlaps) after loading GenomicRanges.

ADD REPLYlink written 4.2 years ago by Devon Ryan89k

Thanks a lot Devon.

ADD REPLYlink written 4.2 years ago by M K460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1094 users visited in the last hour