Question: Finding out if certain SNP positions fall into certain gene regions
0
gravatar for SGMS
22 months ago by
SGMS70
European Union
SGMS70 wrote:

Hi all,

I currently have a certain list of SNPs (chromosome and position). I need to check whether these SNPs fall into another list of gene regions (chr:start_position-end_position) that I have.

I was previously able to find overlaps between positions using findOverlaps in R and I also saw we can do something similar using bedtools intersect. But it seems like those tools are all for finding chr:start-end overlaps, whereas I want to see whether my SNP positions fall into my gene regions of interest.

Any suggestions would be greatly appreciated.

Thank you!

R unix snps region gene • 810 views
ADD COMMENTlink modified 22 months ago by Alex Reynolds30k • written 22 months ago by SGMS70
1

it seems like there was a discussion about this: A: How To Intersect A Range With Single Positions

ADD REPLYlink written 22 months ago by wjidea50

Thanks, my search didn't even fall on that one. It seems I will go with Pierre's suggestion and turn the SNP position into a SNP region having basically the same start and end coordinates. And then do the overlap.

ADD REPLYlink written 22 months ago by SGMS70
1

Good description of data. It would help if you can post input data and expected output SGMS

ADD REPLYlink written 22 months ago by cpad011213k

. But it seems like those tools are all for finding chr:start-end overlaps,

I want to see whether my SNP positions fall into my gene regions of interest.

not clear. What is the difference ?

ADD REPLYlink written 22 months ago by Pierre Lindenbaum129k

With SNPs, I only have a certain position whereas for the region I have chr:start-end. Do you think findOverlaps would work in this case too?

ADD REPLYlink written 22 months ago by SGMS70
2

why don't you convert your positions into 1-base intervals ?

ADD REPLYlink written 22 months ago by Pierre Lindenbaum129k

I thought so. You mean for example:

1:169549811-169549811

right?

ADD REPLYlink written 22 months ago by SGMS70

It would be 1:169549810-169549811, because the BED format is 0-based.

ADD REPLYlink written 22 months ago by ATpoint36k

Thanks. If I want to do that in R though, the positions will remain the same..

ADD REPLYlink written 22 months ago by SGMS70
1

Yes, the overlap functions from IRanges/GenomicRanges assume 1-based coordinates.

ADD REPLYlink written 22 months ago by ATpoint36k
1
gravatar for Alex Reynolds
22 months ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:
$ vcf2bed < snps.vcf > snps.bed
$ gff2bed < genes.gff > genes.bed
$ bedmap --echo --echo-map-id genes.bed snps.bed > answer.bed

The convert2bed binary (vcf2bed and gff2bed) takes care of indexing. You can replace gff2bed with gtf2bed if you have GTF-formatted input.

ADD COMMENTlink modified 22 months ago • written 22 months ago by Alex Reynolds30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1500 users visited in the last hour