How To Find The Nearest Gene To A Retrotransposon Insert?
3
5
Entering edit mode
10.1 years ago
Joseph Hughes ★ 2.9k

Hi,

I have a BED file with the position of retrotransposons in the mouse genome and I would like to find the nearest gene, the distance to that gene and whether it is on the + or - strand. There are so many different file formats for the mouse genome and many different databases to choose from, I was wondering what the best tool and what the best database to use would be.

Cheers, Joseph

bedtools bed mouse position • 4.3k views
ADD COMMENT
0
Entering edit mode

Because many retrotransposon promoters are strong drivers of transcription in both directions, I would suggest collecting both the + and - strand nearest genes.

ADD REPLY
13
Entering edit mode
10.1 years ago

It is admittedly my own tool, but the closest operation in bedtools will do what you want. The -d option will report the distance between the retrotransposon and the nearest gene. If they in fact overlap one another, the distance will be 0. My answer assumes that the genes.bed file includes the gene's strand. If it does, the strand will be reported in the output. Note that GFF is fine as well.

bedtools closest -a retro-inserts.bed -b genes.bed -d

Also, I just remembered that galaxy has a nice option in their "Operate on Genomic Intervals" section called "Fetch closest non-overlapping feature for every interval". This is an equally good option, though it looks like it doesn't report the distance between intervals. That said, once you have the coordinates, a little awk and the formula I mention in this thread is all you need to get the distance.

ADD COMMENT
0
Entering edit mode
10.1 years ago
Joseph Hughes ★ 2.9k

I did exactly what aaronQuinlan suggested above with "closestBed". I obtained the relevant files I needed from the UCSC genome table browser selecting Group Gene and Gene Prediction Tracks, track Ensembl gene and output format BED. I also downloaded the ensemblToGeneName table and used a small perl script to convert the ensembl transcript name to gene name and only keep the columns I wanted.

Now I just need to figure out how to get both the + and - strand nearest genes as Larry_Parnell suggested.

ADD COMMENT
1
Entering edit mode

"Now I just need to figure out how to get both the + and - strand nearest genes as Larry_Parnell suggested." You could it twice: once with -s and once with -S.

ADD REPLY
1
Entering edit mode

-s finds the closest interval on the same strand. -S finds the closest on the opposite strand.

ADD REPLY
0
Entering edit mode

Hi Aaron, I can't find anything about capital -S in the manual. What does it do? Cheers, Joseph

ADD REPLY
0
Entering edit mode
9.9 years ago

The BEDOPS tool called closest-features will also do this:

$ closest-features --closest transposons.bed genes.bed > answer.bed
ADD COMMENT

Login before adding your answer.

Traffic: 856 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6