how to construct CDS by using annotation file (coordinates) properly ???
Assalam o aliakum everyone,

I have a BAM file of dog genome and i have generated consensus FASTA from it. BAM is aligned against Canfam3.1 so i have used annotation file (gff3)of Canfam3.1 from NCBI for extracting CDS from consensus FASTA. Firstly I have fetched Coordinates of my gene.

Coordinates sample of single CDS :

NC_006611.3 Gnomon  CDS 28363101    28363137    .   +   0   ID=cds39781;Parent=rna47080;Dbxref=GeneID:477923,Genbank:XP_013964800.1;Name=XP_013964800.1;gbkey=CDS;gene=FABP5;product=fatty acid-binding protein%2C epidermal;protein_id=XP_013964800.1

NC_006611.3 Gnomon  CDS 28491275    28491447    .   +   2   ID=cds39781;Parent=rna47080;Dbxref=GeneID:477923,Genbank:XP_013964800.1;Name=XP_013964800.1;gbkey=CDS;gene=FABP5;product=fatty acid-binding protein%2C epidermal;protein_id=XP_013964800.1

NC_006611.3 Gnomon  CDS 28491806    28491907    .   +   0   ID=cds39781;Parent=rna47080;Dbxref=GeneID:477923,Genbank:XP_013964800.1;Name=XP_013964800.1;gbkey=CDS;gene=FABP5;product=fatty acid-binding protein%2C epidermal;protein_id=XP_013964800.1

NC_006611.3 Gnomon  CDS 28492441    28492494    .   +   0   ID=cds39781;Parent=rna47080;Dbxref=GeneID:477923,Genbank:XP_013964800.1;Name=XP_013964800.1;gbkey=CDS;gene=FABP5;product=fatty acid-binding protein%2C epidermal;protein_id=XP_013964800.1


I have used above coordinates and fetched corresponding sequence from consensus FASTA.

Sequence Sample of single CDS:

>chr29:28363101-28363137
TGACTGTGTCAGTCCAGGTTCTCTGGGGGACTGAGG
>chr29:28491275-28491447
AGTGGGAATGGCTCTGCGAAAGGTGGGTGCAATGGCCAAACCAGATTGTATCATCTCTTCTGACGGCAAAAACCTCACCATAAAAACTGAGAGCACTTTGAAAACAACACAGTTTTCGTGTAATCTGGGAGAGAAGTTTGAAGAAACTACAGCTGATGGCAGAAAAACTCAG
>chr29:28491806-28491907
CTGTCTGCAACTTCACAGACGGCGCATTGGTTCAACATCAGGAATGGGATGGGAAGGAAAGCACAATAACAAGAAAGTTGGAAGATGGGAAATTGGTGGTG
>chr29:28492441-28492494
AATGCGTCATGAACAATGTCACCTGTACGCGGATCTATGAAAAAGTAGAGTAA


I will Concatenate these parts of the CDS further but As u can see in the above example base A of the start codon (ATG) is missing. How can i fix it???

now i have multiple questions (I'm not getting that where is problem actually)

is it happened due to 0-based , 1-based coordinate system ??

Should i add add one base (off-by-one) at the start of each starting coordinate ??? (Actually i checked it for firs coordinate only i have reduced start coordinate by 1 and it always give base A)

should i reduce start coordinate by one for each part of the CDS ???

how can i check that my bam file is 0-based or 1-based ???

I have used above coordinates and fetched corresponding sequence from consensus FASTA.

how ? how did you get the fasta sequences ?

bam file is 0-based or 1-based ???

a bam file is internally 0-Based

a sam file is always 1-based.

Sorry for this late reply !

i have fetched column 4 and 5 from gff3 (annotation) file and made a bed6 file then i have used bedtools getfasta for getting FASTA sequence.

I have downloaded bam file and then generated consensus FASTA from bam file by using samtools. what is the format of my fasta file now ????