Locating Indels In Gene
1
1
Entering edit mode
11.7 years ago
bioinfo ▴ 830

I have a excel file from windows (spreadsheet) contains start, end and indel sequences (-GGC means GGC deletion or +AAC means AAC insertion) in column 1, 2 and 3 respectively. I have a reference genome in both fasta format and EMBL format (annotated). How can I locate my indels in the genome based on reference? I want to find where eaxctly my indels are (e.g. in which gene or intergenic regions?)

e.g. my file

Start       End      Indel_Description
  12        20        +GCCGCAC
  45        46        -C
indel annotation • 2.7k views
ADD COMMENT
0
Entering edit mode

What genome? Do you have the positions of genes in the genome (base pair locations?

ADD REPLY
0
Entering edit mode

Bacterial genome...I have the positions of all genes. Now I have to compare my indel positions to allocate genes..!!

ADD REPLY
0
Entering edit mode
11.7 years ago

You can take a look at using bedtools if you want a command-line solution or the GenomicRanges package from Bioconductor to do genomic overlaps. Any Galaxy server can do these types of overlaps, also.

ADD COMMENT
0
Entering edit mode

I prefer command line..I will have a go with your options though I haven't used bioconductor before. Any other solutions with regular perl script?

ADD REPLY
0
Entering edit mode

You will probably need to write something in perl (or some other language) to convert to BED format if you want to use bedtools. I do know know of a perl script that does what you are asking for.

ADD REPLY
0
Entering edit mode

The excel file I mentioned above is actually the indel file in BED format. I have the annotated EMBL file of my genes. I have planned to convert the Embl file to gff format and then gff to BED format as of the BIoStart experts suggested. Then I will use the BEDtools to check the overlaps between these two BED files. I was wondering if there is any extra column in my genes.bed file compared to just 3 columns in indel.bed file will that work? or my genes.bed file should have exactly three columns like Indels.bed file to check the overlapping?

ADD REPLY
0
Entering edit mode

See the bedTools manual for details, but bedTools supports GFF and several flavors of BED format.

ADD REPLY

Login before adding your answer.

Traffic: 2028 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6