Incompatibility issue with fasta and GFF (fasta2GFF)
1
0
Entering edit mode
5.2 years ago
jaqx008 ▴ 110

Good day to you all, I am about to run a script that requires a fasta and a gff file in the command. But the fatsa and gff I have downloaded from NCBI contain different loci names. for example fasta has the following type loci naming

Bf_V2_22 
Bf_V2_22

Gff has the following type

NW_003101570.1 
NW_003101570.1

I need both of them to have same names. What do I do? would I get the same cordinates in the gff if I convert fasta2gff? ( How do I convert fasta2gff). Or what is a better thing to do ? Thanks

fasta2gff mapping bedtools • 1.4k views
ADD COMMENT
0
Entering edit mode

Hello jaqx008 ,

where did you download the fasta and gff files exactly?

You cannot convert a fasta file to gff unless the header of the sequence include the necessary information.

fin swimmer

ADD REPLY
0
Entering edit mode

Hi finswimmer. I downloaded them from ncbi as mentioned above.

ADD REPLY
0
Entering edit mode

NCBI has tones of data. So please be more specific.

ADD REPLY
1
Entering edit mode
5.2 years ago
GenoMax 141k

You probably did not get the sequence from the same location.

$ zgrep ">" GCF_000003815.1_Version_2_genomic.fna.gz
>NW_003101570.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_1, whole genome shotgun sequence
>NW_003101569.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_2, whole genome shotgun sequence
>NW_003101568.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_3, whole genome shotgun sequence
>NW_003101567.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_4, whole genome shotgun sequence
>NW_003101566.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_5, whole genome shotgun sequence
>NW_003101565.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_6, whole genome shotgun sequence
>NW_003101564.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_7, whole genome shotgun sequence
>NW_003101563.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_8, whole genome shotgun sequence
>NW_003101562.1 Branchiostoma floridae genomic scaffold BRAFLscaffold_9, whole genome shotgun sequence
ADD COMMENT
0
Entering edit mode

Were did you get this from please?

ADD REPLY
0
Entering edit mode

I provided a link to the file above.

ADD REPLY
0
Entering edit mode

I just found that the same script have to call a x.bed file. my bed file has the loci in the other format which is

Bf_V2_22 
Bf_V2_22

this is probably because the TE.bed was made from the fasta file with Bf_v type naming

ADD REPLY
0
Entering edit mode

Then you will either need to remake the bed file or translate those ID's so they match NCBI's identifiers.

ADD REPLY
0
Entering edit mode

Do I have to do that manually? and Isnt there a possibility for mismatch of ID?

ADD REPLY
0
Entering edit mode

You will need to go back and re-make/process the data to get ID's that match.

ADD REPLY

Login before adding your answer.

Traffic: 1849 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6