Question: Finding out if certain SNP positions fall into certain gene regions
0
gravatar for SGMS
9 months ago by
SGMS60
European Union
SGMS60 wrote:

Hi all,

I currently have a certain list of SNPs (chromosome and position). I need to check whether these SNPs fall into another list of gene regions (chr:start_position-end_position) that I have.

I was previously able to find overlaps between positions using findOverlaps in R and I also saw we can do something similar using bedtools intersect. But it seems like those tools are all for finding chr:start-end overlaps, whereas I want to see whether my SNP positions fall into my gene regions of interest.

Any suggestions would be greatly appreciated.

Thank you!

R unix snps region gene • 440 views
ADD COMMENTlink modified 9 months ago by Alex Reynolds28k • written 9 months ago by SGMS60
1

it seems like there was a discussion about this: A: How To Intersect A Range With Single Positions

ADD REPLYlink written 9 months ago by wjidea50

Thanks, my search didn't even fall on that one. It seems I will go with Pierre's suggestion and turn the SNP position into a SNP region having basically the same start and end coordinates. And then do the overlap.

ADD REPLYlink written 9 months ago by SGMS60
1

Good description of data. It would help if you can post input data and expected output SGMS

ADD REPLYlink written 9 months ago by cpad011211k

. But it seems like those tools are all for finding chr:start-end overlaps,

I want to see whether my SNP positions fall into my gene regions of interest.

not clear. What is the difference ?

ADD REPLYlink written 9 months ago by Pierre Lindenbaum121k

With SNPs, I only have a certain position whereas for the region I have chr:start-end. Do you think findOverlaps would work in this case too?

ADD REPLYlink written 9 months ago by SGMS60
2

why don't you convert your positions into 1-base intervals ?

ADD REPLYlink written 9 months ago by Pierre Lindenbaum121k

I thought so. You mean for example:

1:169549811-169549811

right?

ADD REPLYlink written 9 months ago by SGMS60

It would be 1:169549810-169549811, because the BED format is 0-based.

ADD REPLYlink written 9 months ago by ATpoint19k

Thanks. If I want to do that in R though, the positions will remain the same..

ADD REPLYlink written 9 months ago by SGMS60
1

Yes, the overlap functions from IRanges/GenomicRanges assume 1-based coordinates.

ADD REPLYlink written 9 months ago by ATpoint19k
1
gravatar for Alex Reynolds
9 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:
$ vcf2bed < snps.vcf > snps.bed
$ gff2bed < genes.gff > genes.bed
$ bedmap --echo --echo-map-id genes.bed snps.bed > answer.bed

The convert2bed binary (vcf2bed and gff2bed) takes care of indexing. You can replace gff2bed with gtf2bed if you have GTF-formatted input.

ADD COMMENTlink modified 9 months ago • written 9 months ago by Alex Reynolds28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 870 users visited in the last hour