I need to retrieve a GFF with a specific accession number. Searching through a file in FASTA format I have:
>gi|345090966|ref|NG_029839.1|:195425-195447 Homo sapiens c-Maf inducing protein (CMIP), RefSeqGene on chromosome 16 TGCAAAAGTAATTGCAGTTTTTG >gi|343168829|gb|AC245437.1|:21551-21573 Homo sapiens FOSMID clone ABC14-947514C10 from chromosome unknown, complete sequence CAAAAACTGCAATTACTTTTGCA >gi|340523118|ref|NG_029471.1|:35678-35700 Homo sapiens hemopoietic cell kinase (HCK), RefSeqGene on chromosome 20 CAAAAACTGCAATTACTTTTGCA
I am able to retrieve the GFF related to lines with
gb as sequence identifier, such as:
>gi|343168829|gb|AC245437.1|:21551-21573 Homo sapiens FOSMID clone ABC14-947514C10 from chromosome unknown, complete sequence CAAAAACTGCAATTACTTTTGCA
with a BioPerl script, in this way:
./bp_genbank2gff.pl -accession AC245437.1 -stdout > AC245437.1.gff
I am unable to get gff with different sequence identifier. What am I doing wrong?
Thanks for your effort. I am a little confused.. Is it possible to extract more features that in this case aren't displayed? I have to extract all of them. Also, I need to retrieve informations from RefSeq too; do you know how to do it. Thanks again
I have done some research and I found this http://gmod.org/wiki/Load_RefSeq_Into_Chado that explains how to do the conversion.. Anyway with this script, as you pointed out, there's no chance to download from RefSeq accession number.