Question: Incompatibility issue with fasta and GFF (fasta2GFF)
0
gravatar for jaqx008
3 months ago by
jaqx00840
jaqx00840 wrote:

Good day to you all, I am about to run a script that requires a fasta and a gff file in the command. But the fatsa and gff I have downloaded from NCBI contain different loci names. for example fasta has the following type loci naming

Bf_V2_22 
Bf_V2_22

Gff has the following type

NW_003101570.1 
NW_003101570.1

I need both of them to have same names. What do I do? would I get the same cordinates in the gff if I convert fasta2gff? ( How do I convert fasta2gff). Or what is a better thing to do ? Thanks

mapping fasta2gff bedtools • 274 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by jaqx00840

Hello jaqx008 ,

where did you download the fasta and gff files exactly?

You cannot convert a fasta file to gff unless the header of the sequence include the necessary information.

fin swimmer

ADD REPLYlink written 3 months ago by finswimmer11k

Hi finswimmer. I downloaded them from ncbi as mentioned above.

ADD REPLYlink written 3 months ago by jaqx00840

NCBI has tones of data. So please be more specific.

ADD REPLYlink written 3 months ago by finswimmer11k
1
gravatar for genomax
3 months ago by
genomax67k
United States
genomax67k wrote:

You probably did not get the sequence from the same location.

$ zgrep ">" GCF_000003815.1_Version_2_genomic.fna.gz
>NW_003101570.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_1, whole genome shotgun sequence
>NW_003101569.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_2, whole genome shotgun sequence
>NW_003101568.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_3, whole genome shotgun sequence
>NW_003101567.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_4, whole genome shotgun sequence
>NW_003101566.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_5, whole genome shotgun sequence
>NW_003101565.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_6, whole genome shotgun sequence
>NW_003101564.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_7, whole genome shotgun sequence
>NW_003101563.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_8, whole genome shotgun sequence
>NW_003101562.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_9, whole genome shotgun sequence
ADD COMMENTlink modified 3 months ago • written 3 months ago by genomax67k

Were did you get this from please?

ADD REPLYlink written 3 months ago by jaqx00840

I provided a link to the file above.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax67k

I just found that the same script have to call a x.bed file. my bed file has the loci in the other format which is

Bf_V2_22 
Bf_V2_22

this is probably because the TE.bed was made from the fasta file with Bf_v type naming

ADD REPLYlink modified 3 months ago • written 3 months ago by jaqx00840

Then you will either need to remake the bed file or translate those ID's so they match NCBI's identifiers.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax67k

Do I have to do that manually? and Isnt there a possibility for mismatch of ID?

ADD REPLYlink written 3 months ago by jaqx00840

You will need to go back and re-make/process the data to get ID's that match.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax67k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1617 users visited in the last hour