Question: Incompatibility issue with fasta and GFF (fasta2GFF)
0
gravatar for jaqx008
14 months ago by
jaqx00870
jaqx00870 wrote:

Good day to you all, I am about to run a script that requires a fasta and a gff file in the command. But the fatsa and gff I have downloaded from NCBI contain different loci names. for example fasta has the following type loci naming

Bf_V2_22 
Bf_V2_22

Gff has the following type

NW_003101570.1 
NW_003101570.1

I need both of them to have same names. What do I do? would I get the same cordinates in the gff if I convert fasta2gff? ( How do I convert fasta2gff). Or what is a better thing to do ? Thanks

mapping fasta2gff bedtools • 454 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by jaqx00870

Hello jaqx008 ,

where did you download the fasta and gff files exactly?

You cannot convert a fasta file to gff unless the header of the sequence include the necessary information.

fin swimmer

ADD REPLYlink written 14 months ago by finswimmer13k

Hi finswimmer. I downloaded them from ncbi as mentioned above.

ADD REPLYlink written 14 months ago by jaqx00870

NCBI has tones of data. So please be more specific.

ADD REPLYlink written 14 months ago by finswimmer13k
1
gravatar for genomax
14 months ago by
genomax80k
United States
genomax80k wrote:

You probably did not get the sequence from the same location.

$ zgrep ">" GCF_000003815.1_Version_2_genomic.fna.gz
>NW_003101570.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_1, whole genome shotgun sequence
>NW_003101569.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_2, whole genome shotgun sequence
>NW_003101568.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_3, whole genome shotgun sequence
>NW_003101567.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_4, whole genome shotgun sequence
>NW_003101566.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_5, whole genome shotgun sequence
>NW_003101565.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_6, whole genome shotgun sequence
>NW_003101564.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_7, whole genome shotgun sequence
>NW_003101563.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_8, whole genome shotgun sequence
>NW_003101562.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_9, whole genome shotgun sequence
ADD COMMENTlink modified 14 months ago • written 14 months ago by genomax80k

Were did you get this from please?

ADD REPLYlink written 14 months ago by jaqx00870

I provided a link to the file above.

ADD REPLYlink modified 14 months ago • written 14 months ago by genomax80k

I just found that the same script have to call a x.bed file. my bed file has the loci in the other format which is

Bf_V2_22 
Bf_V2_22

this is probably because the TE.bed was made from the fasta file with Bf_v type naming

ADD REPLYlink modified 14 months ago • written 14 months ago by jaqx00870

Then you will either need to remake the bed file or translate those ID's so they match NCBI's identifiers.

ADD REPLYlink modified 14 months ago • written 14 months ago by genomax80k

Do I have to do that manually? and Isnt there a possibility for mismatch of ID?

ADD REPLYlink written 14 months ago by jaqx00870

You will need to go back and re-make/process the data to get ID's that match.

ADD REPLYlink modified 14 months ago • written 14 months ago by genomax80k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1628 users visited in the last hour