I want to retrieve the genes that are present within a series of regions. Say, I have a bed file with query positions such like:
1 2665697 4665777 MIR201 1 10391435 12391516 MIR500 1 15106831 17106911 MIR122 1 23436535 25436616 MIR234 1 23436575 25436656 MIR488
I would like to get the genes that fall within those regions.
I have tried using biomaRt, and bedtools intersect, but the output I get, is a list of genes corresponding to all the regions, not one by one, as the desired output I would like to get would be the genes within each row, but in separate rows, a if I did one query region at a time. Basically I want to know what genes fall within each region, but still being able to identify which genes fall in which regions.
What I am doing is, from a region of detected miRNA, I am expanding the genome region upwards and downwards, so that I get the neighboring genes from this miRNA. I am using a 1 million bases windows up and down. This would work for just one query, but, how to do many queries with biomaRt or many intersections with bedtools, so that I get somewhat like:
1 2665697 4665777 MIR201 GENEX, GENEY, GENEZ... 1 10391435 12391516 MIR500 GENEA, GENEB, GENEC... 1 15106831 17106911 MIR122 1 23436535 25436616 MIR234 1 23436575 25436656 MIR488
Meaning that GENEX, GENEY and GENEZ fall within 1:2665697-4665777, with MIR201, placed in the middle, as this region is calculated subtracting 1 million bp to sart, and adding 1 million bp to end position.
I am somewhat determining the neighboring genes from each miRNA, to compare within species, but I do not get how to query multiple regions individually using biomaRt or bedtools
There are various solutions here: How to get all genes for a specific list of regions in R preferably using Biomart
You can even follow my solution to annotate CN segment data (BED Format): A: How to extract the list of genes from TCGA CNV data
Yet another solution is to use
bedtools intersect -a MyRegions -b Annotation.GTF -wao
Same as at StackOverflow? https://stackoverflow.com/questions/50136262/query-genes-within-regions
It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/50136262/query-genes-within-regions
This is typically not recommended as it runs the risk of annoying people in both communities.