Question: Can I extract annotation file from genbank format or modify on current gff file?
0
gravatar for Ripacoco
29 days ago by
Ripacoco10
Ripacoco10 wrote:

I'm fairly new to NGS and bioinformatics... I performed ChIP-seq where I inserted some nucleotides in the E.coli background. I had no problem adding the few nucleotides and create a fasta file for mapping (I download genbank file of the background genome from NCBI and added nucleotides in snapgene and exported as a fasta file)... Now I want to visualize in IGV and would love to add the annotation as well... I had a genbank format that already contained all the annotation information, is there a way to create an annotation file from it? I found I can see the annotation if I directed load the gb file into IGV but somehow in that case my output from MACS2 didn't show up on the track at all (empty on the track)... If I load in the fasta file only, I can visualize the peaks...but then I can't see the annotations. Are there any recommended ways to create a modified annotation file (I only have one long insert in the genome)?

igv chip-seq annotation • 216 views
ADD COMMENTlink modified 10 days ago by Biostar ♦♦ 20 • written 29 days ago by Ripacoco10

Hi, you can use awk to extract certain columns of gff files (e.g. accession number, start, stop, strand,...), and add some rows based on your annotations. You need to check with IGV to see what columns of gff is required. If you provide some examples, I may be able to help better.

ADD REPLYlink written 29 days ago by Fatima820

Thanks, Fatima. The thing is that since I insert a small piece of sequence into E.coli MG1655, I need to change all the coordinates accordingly, so simply extracting the rows won't work. Instead of manually doing that, I was wondering since I already have the Genbank format with full annotation, can I somehow output a gff file from there. I tired but was only able to output some tab file where my IGV having problems reading in.

ADD REPLYlink written 29 days ago by Ripacoco10

You can try "Fraggenescan" to do the gene prediction in the modified sequence, and then combine the gff file from Fraggenescan with the gene bank annotation that you have.

ADD REPLYlink modified 28 days ago • written 28 days ago by Fatima820

Hi, try to annotate your genome (edited fasta file) with prokka. I know that it must more easy to just correct the coordinates in the genbank file with some text edit tool, but i really dont trust that the change will work well. Prokka is very easy to use and can annotate a genome in under 20 min in a normal desktop. If you have some curated data in your previous annotation you can feed yout genbank file to prokka to emprove the final result.

ADD REPLYlink modified 29 days ago • written 29 days ago by hugo.avila160
1

Thanks for the suggestion, I'll try that!

ADD REPLYlink written 29 days ago by Ripacoco10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1216 users visited in the last hour