Question: Locating Indels In Gene
1
gravatar for bioinfo
7.6 years ago by
bioinfo740
bioinfo740 wrote:

I have a excel file from windows (spreadsheet) contains start, end and indel sequences (-GGC means GGC deletion or +AAC means AAC insertion) in column 1, 2 and 3 respectively. I have a reference genome in both fasta format and EMBL format (annotated). How can I locate my indels in the genome based on reference? I want to find where eaxctly my indels are (e.g. in which gene or intergenic regions?)

e.g. my file

Start       End      Indel_Description
  12        20        +GCCGCAC
  45        46        -C
indel annotation • 1.9k views
ADD COMMENTlink modified 7.6 years ago by Sean Davis26k • written 7.6 years ago by bioinfo740

What genome? Do you have the positions of genes in the genome (base pair locations?

ADD REPLYlink written 7.6 years ago by Sean Davis26k

Bacterial genome...I have the positions of all genes. Now I have to compare my indel positions to allocate genes..!!

ADD REPLYlink written 7.6 years ago by bioinfo740
0
gravatar for Sean Davis
7.6 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

You can take a look at using bedtools if you want a command-line solution or the GenomicRanges package from Bioconductor to do genomic overlaps. Any Galaxy server can do these types of overlaps, also.

ADD COMMENTlink written 7.6 years ago by Sean Davis26k

I prefer command line..I will have a go with your options though I haven't used bioconductor before. Any other solutions with regular perl script?

ADD REPLYlink written 7.6 years ago by bioinfo740

You will probably need to write something in perl (or some other language) to convert to BED format if you want to use bedtools. I do know know of a perl script that does what you are asking for.

ADD REPLYlink written 7.6 years ago by Sean Davis26k

The excel file I mentioned above is actually the indel file in BED format. I have the annotated EMBL file of my genes. I have planned to convert the Embl file to gff format and then gff to BED format as of the BIoStart experts suggested. Then I will use the BEDtools to check the overlaps between these two BED files. I was wondering if there is any extra column in my genes.bed file compared to just 3 columns in indel.bed file will that work? or my genes.bed file should have exactly three columns like Indels.bed file to check the overlapping?

ADD REPLYlink modified 7.6 years ago • written 7.6 years ago by bioinfo740

See the bedTools manual for details, but bedTools supports GFF and several flavors of BED format.

ADD REPLYlink written 7.6 years ago by Sean Davis26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 992 users visited in the last hour