How To Get Gene names +10Kb and -10Kb
3
1
Entering edit mode
6.8 years ago
Xiaohan Pan ▴ 10

Question: I need to find gene names within +-10Kb of my siggenes.

I got some siggenes(lncRNA homosapiens) from cuffdiff, and I have a GFF file named ’gencode.v26.long_noncoding_RNAs.gff3’. I need to capture genes within +-10Kb of these siggenes, such as ENO1-AS1, FAM41C, GRM5-AS1, HCG20 and HSD52. Somebody told me I can do it through GenomicRanges, I looked at the manual carefully but still did not know how to do it.

Who can give me some advice?

RNA-Seq gene • 2.5k views
ADD COMMENT
4
Entering edit mode
6.8 years ago

Okay, sounds like you have a list of HGNC symbol names in a text file.

Let's say that file is called genes.txt or you work your symbols into a text file called genes.txt or whatevs.

$ more genes.txt
EAF1-AS1
A1BG-AS1
BISPR
CAHM
DUXAP8
ENO1-AS1
AM41C
GRM5-AS1
HCG20
HSD52
KC6
MIR381HG
OVAAL
PICSAR
RORB-AS1
ZNRF3-AS1

Assuming you're working with hg38, grab refGene entries and grep 'em against the HGNC symbol names to get a sorted BED file:

$ wget -qO- "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz" | gunzip -c - | grep -i -f genes.txt - | awk 'BEGIN{OFS="\t"}{print $3,$5,$6,$13}' - | sort-bed - > genes.bed

Convert your LNCs to a sorted BED file:

$ gff2bed < gencode.v26.long_noncoding_RNAs.gff3 > gencode.v26.lncRNAs.bed

Then bedmap 'em all:

$ bedmap --echo --echo-map-id-uniq --range 10000 genes.bed gencode.v26.lncRNAs.bed > answer.bed

Done. BEDOPS. Done.

ADD COMMENT
0
Entering edit mode

These genes are lncRNA, but I find the refGene.txt.gz contains mRNA gene information. I follow your way, and I got a empty answer.bed

ADD REPLY
0
Entering edit mode

I downloaded gencode.v26.annotation.gff3, the program work! But the results in the answer.bed just a lot of ENSTs like this(NR4A1|CDS:ENST00000243050.5;CDS:ENST00000293662.8;CDS:ENST00000360284.7;CDS:ENST00000394824.2;CDS:ENST00000394825.5;CDS:ENST00000545748.5).

ADD REPLY
0
Entering edit mode

You might look at a name translation tool like DAVID to get from Enseml transcript names to something else.

ADD REPLY
0
Entering edit mode

Get it, thanks so much

ADD REPLY
2
Entering edit mode
6.8 years ago

Using BEDOPS:

Write your siggenes file from cuffdiff to BED and sort it:

$ sort-bed siggenes.unsorted.bed > siggenes.bed

Write GFF to sorted BED:

$ gff2bed < gencode.v26.long_noncoding_RNAs.gff3 > gencode.v26.lncRNAs.bed

Map Gencode v26 annotation IDs (gene names) to siggenes, with a range of 10kb:

$ bedmap --echo --echo-map-id-uniq --range 10000 siggenes.bed gencode.v26.lncRNAs.bed > answer.bed
ADD COMMENT
0
Entering edit mode

I'll have a try, thanks!

ADD REPLY
0
Entering edit mode

I'm sorry but how can I change my siggenes into BED file format? I just have my siggenes ID with no ohter information about them.

ADD REPLY
0
Entering edit mode

I couldn't find much documentation about siggenes formats. You need to translate your IDs into chromosome-start-stop positions in order to map them to your GFF, however.

ADD REPLY
0
Entering edit mode

Thanks, I'll try to find a way out.

ADD REPLY
0
Entering edit mode

If you want to post a snippet of what you have, maybe we can figure out the format and help you figure a way out.

ADD REPLY
0
Entering edit mode

Thank you, but I just want to know genes upstream and downstream of my siggenes, what I have is a list of genename, no format, like'EAF1-AS1, A1BG-AS1, BISPR, CAHM, DUXAP8, ENO1-AS1, FAM41C, GRM5-AS1, HCG20, HSD52, KC6, MIR381HG, OVAAL, PICSAR, RORB-AS1, ZNRF3-AS1', and a ref gencode.v26.long_noncoding_RNAs.gff3

ADD REPLY
2
Entering edit mode
6.8 years ago
EagleEye 7.5k

BED tools Closest

ADD COMMENT
0
Entering edit mode

Thank you for your help~

ADD REPLY

Login before adding your answer.

Traffic: 2018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6