Question: How To Find The Nearest Gene To A Retrotransposon Insert?
5
gravatar for Joseph Hughes
6.8 years ago by
Joseph Hughes2.6k
Scotland, UK
Joseph Hughes2.6k wrote:

Hi,

I have a BED file with the position of retrotransposons in the mouse genome and I would like to find the nearest gene, the distance to that gene and whether it is on the + or - strand. There are so many different file formats for the mouse genome and many different databases to choose from, I was wondering what the best tool and what the best database to use would be.

Cheers, Joseph

bedtools bed position mouse • 3.2k views
ADD COMMENTlink written 6.8 years ago by Joseph Hughes2.6k

Because many retrotransposon promoters are strong drivers of transcription in both directions, I would suggest collecting both the + and - strand nearest genes.

ADD REPLYlink written 6.8 years ago by Larry_Parnell16k
11
gravatar for Aaronquinlan
6.8 years ago by
Aaronquinlan10k
United States
Aaronquinlan10k wrote:

It is admittedly my own tool, but the closest operation in bedtools will do what you want. The -d option will report the distance between the retrotransposon and the nearest gene. If they in fact overlap one another, the distance will be 0. My answer assumes that the "genes.bed" file includes the gene's strand. If it does, the strand will be reported in the output. Note that GFF is fine as well.

bedtools closest -a retro-inserts.bed -b genes.bed -d

Also, I just remembered that galaxy has a nice option in their "Operate on Genomic Intervals" section called "Fetch closest non-overlapping feature for every interval". This is an equally good option, though it looks like it doesn't report the distance between intervals. That said, once you have the coordinates, a little awk and the formula I mention in this thread is all you need to get the distance.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Aaronquinlan10k
0
gravatar for Joseph Hughes
6.8 years ago by
Joseph Hughes2.6k
Scotland, UK
Joseph Hughes2.6k wrote:

I did exactly what aaronQuinlan suggested above with "closestBed". I obtained the relevant files I needed from the UCSC genome table browser selecting Group Gene and Gene Prediction Tracks, track Ensembl gene and output format BED. I also downloaded the ensemblToGeneName table and used a small perl script to convert the ensembl transcript name to gene name and only keep the columns I wanted.

Now I just need to figure out how to get both the + and - strand nearest genes as Larry_Parnell suggested.

ADD COMMENTlink written 6.8 years ago by Joseph Hughes2.6k
1

"Now I just need to figure out how to get both the + and - strand nearest genes as Larry_Parnell suggested." You could it twice: once with -s and once with -S.

ADD REPLYlink written 6.8 years ago by Aaronquinlan10k
1

-s finds the closest interval on the same strand. -S finds the closest on the opposite strand.

ADD REPLYlink written 6.8 years ago by Aaronquinlan10k

Hi Aaron, I can't find anything about capital -S in the manual. What does it do? Cheers, Joseph

ADD REPLYlink written 6.8 years ago by Joseph Hughes2.6k
0
gravatar for Alex Reynolds
6.6 years ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

The BEDOPS tool called closest-features will also do this:

$ closest-features --closest transposons.bed genes.bed > answer.bed
ADD COMMENTlink modified 5.7 years ago • written 6.6 years ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 797 users visited in the last hour