Question: how to construct CDS by using annotation file (coordinates) properly ???
0
gravatar for adeena_hassan
2.2 years ago by
adeena_hassan40 wrote:

Assalam o aliakum everyone,

I have a BAM file of dog genome and i have generated consensus FASTA from it. BAM is aligned against Canfam3.1 so i have used annotation file (gff3)of Canfam3.1 from NCBI for extracting CDS from consensus FASTA. Firstly I have fetched Coordinates of my gene.

Coordinates sample of single CDS :

NC_006611.3 Gnomon  CDS 28363101    28363137    .   +   0   ID=cds39781;Parent=rna47080;Dbxref=GeneID:477923,Genbank:XP_013964800.1;Name=XP_013964800.1;gbkey=CDS;gene=FABP5;product=fatty acid-binding protein%2C epidermal;protein_id=XP_013964800.1

NC_006611.3 Gnomon  CDS 28491275    28491447    .   +   2   ID=cds39781;Parent=rna47080;Dbxref=GeneID:477923,Genbank:XP_013964800.1;Name=XP_013964800.1;gbkey=CDS;gene=FABP5;product=fatty acid-binding protein%2C epidermal;protein_id=XP_013964800.1

NC_006611.3 Gnomon  CDS 28491806    28491907    .   +   0   ID=cds39781;Parent=rna47080;Dbxref=GeneID:477923,Genbank:XP_013964800.1;Name=XP_013964800.1;gbkey=CDS;gene=FABP5;product=fatty acid-binding protein%2C epidermal;protein_id=XP_013964800.1

NC_006611.3 Gnomon  CDS 28492441    28492494    .   +   0   ID=cds39781;Parent=rna47080;Dbxref=GeneID:477923,Genbank:XP_013964800.1;Name=XP_013964800.1;gbkey=CDS;gene=FABP5;product=fatty acid-binding protein%2C epidermal;protein_id=XP_013964800.1

I have used above coordinates and fetched corresponding sequence from consensus FASTA.

Sequence Sample of single CDS:

>chr29:28363101-28363137
TGACTGTGTCAGTCCAGGTTCTCTGGGGGACTGAGG
>chr29:28491275-28491447
AGTGGGAATGGCTCTGCGAAAGGTGGGTGCAATGGCCAAACCAGATTGTATCATCTCTTCTGACGGCAAAAACCTCACCATAAAAACTGAGAGCACTTTGAAAACAACACAGTTTTCGTGTAATCTGGGAGAGAAGTTTGAAGAAACTACAGCTGATGGCAGAAAAACTCAG
>chr29:28491806-28491907
CTGTCTGCAACTTCACAGACGGCGCATTGGTTCAACATCAGGAATGGGATGGGAAGGAAAGCACAATAACAAGAAAGTTGGAAGATGGGAAATTGGTGGTG
>chr29:28492441-28492494
AATGCGTCATGAACAATGTCACCTGTACGCGGATCTATGAAAAAGTAGAGTAA

I will Concatenate these parts of the CDS further but As u can see in the above example base A of the start codon (ATG) is missing. How can i fix it???

now i have multiple questions (I'm not getting that where is problem actually)

is it happened due to 0-based , 1-based coordinate system ??

Should i add add one base (off-by-one) at the start of each starting coordinate ??? (Actually i checked it for firs coordinate only i have reduced start coordinate by 1 and it always give base A)

should i reduce start coordinate by one for each part of the CDS ???

how can i check that my bam file is 0-based or 1-based ???

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by adeena_hassan40
1

I have used above coordinates and fetched corresponding sequence from consensus FASTA.

how ? how did you get the fasta sequences ?

bam file is 0-based or 1-based ???

a bam file is internally 0-Based

a sam file is always 1-based.

ADD REPLYlink written 2.2 years ago by Pierre Lindenbaum124k

Sorry for this late reply !

i have fetched column 4 and 5 from gff3 (annotation) file and made a bed6 file then i have used bedtools getfasta for getting FASTA sequence.

I have downloaded bam file and then generated consensus FASTA from bam file by using samtools. what is the format of my fasta file now ????

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by adeena_hassan40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1769 users visited in the last hour