a CDS without gene_Id's can be annotated?
1
0
Entering edit mode
4.7 years ago
m986 ▴ 10

What is the correct way to make an annotation in a CDS file without ID gene names? This CDS is from Capsicum annuum and looks like this:

>Id16
ATGCATCATCCCATCTTTCATGCTTCTGGTTCTGTGGAAGGGCATTGGATTAGGATTCCCCCACCTCATAAAACATCATTTTATGCTTCTGACATA
TATGATATGAAAGAAGATGAGTCTTTATTCGCCTCATCAGGCATAGTTTCTTTTCAAGAAAGAGACAGAGGATATGAGCTTGACACCGCAGCTAGG
CATGGTTCCGCAGACTGTATACGTGAGCATCTTAGACAAGATCAAATTGAGGATTTGTCATCGTCCCCTCCAGCTGTTGGCTCCATACAGATTGGT
AGAAGCAATGGCTTTGGCCATAACATAGAGTTCATGTCTCAAGCTTACCTCAGAAACAGAAGCTCAGATATTAATATAGAGGTGAAGATTAGCCAA
GCTTCCTCCAACAATCCTGTCAAGGAAGTTGCATCAAAGGTAGCTTCCCAGTTTGAGCATGACAATTACAAGCTGATACTTAAGGTTCGAACAAGG
AAGGGTGAAATTCTTGCCTTAATGGGGCCTTCTGGCAGTGGGAAAACAACCTTGTTAAAGATATTGGGAGAAAGATTGCAAGAAAATGTCAGAG
CATCCCATATAATACAGCTATCAATAAGAGACATCCAAGCAAGATGAGTCAACGTCAGAAGTATGAAAGAGCTGAAGTGCATATTAAAGAATTAGG
CCTGGAAAGATGTCGTCACACGAGAATAGGTGGAGGACTTATTAAAGGCATATTTGGGGGAGAGAGGGAAAAAACTAGCATAGGGTATGAAATCCT
TGTTGATCCTTTTCTCCTCTTGCTCGACGAACCAACTTCAGGCCTTGATTCGTCCTCTGCAAGGAAGGAACGAGTCGGTTCCCCTTTCCGTCTTTC
GGTAGCATAA
>Id17
ATGCCAGTTTCCAGCTATCCGGTTCAAGTCTTTCGTTTTGCCAGCAAGCTGGTGCTTGCAGCCTATGGGCTTTCAGCTGGTGCATGCGATCGAAGA
CTTTATCTAAGAGGTGGATTTCCCTCGATAGTTGGGCATATGATATATGATGGATACAAGTGGGCCAGGAGTCGTAGAGCAATGTCTTTATTGGCA
GTTGCGCAACCTTCGATTGAAGCTACTTCTACAGATTGCGATAGCACCTGTCCTTGGATCAAGGCTCTCTCTCGCTCAAGACGTCGATGTGCCACC
GGGTTGACCCTTTTCTTACCAGCATGGGGAGTTGCGATGGATGCCAAGATGAAGACTCCTCAGCGCCAATTAGGGGGTGCAAGAGATAGTTGGATC
AAGCCTGGGGATAAAGTGATGAGCCCGAGATGTTATAAAGATTTGGGTTTGACTTTTTTGTCTGCTTTGTATGAGTCGACGTATGGAATGCGCCAA
GACATGACTTTGTATGCCATGGCACTGAGAGAAAGACACAGGAGAATTCCTCTTTTAGGAAGACCTAGTAGCTCAGGATCTCTGACGTTTCATGTC
TGTGCCTTTGATCACATACTTTGTCCGCTCGAAGGCTCATGCTCGATCCTCTTTCATTGGATACAGAGATTTCGTTCACTTAAGGCTTGTTTGCAA
TGGAACTGGGAAAAGAGAAAGAAATGGAGTGAAGAGCTTCGTTGA
>Id18
ATGCCTTCTTGGTCGAAGAGCCCCTTTTATACTAGTAAGGACGTAGGAAGCAAAGAAACTTATGCGAAGGACGTTTTCTTCTCTGCCCTCTCCTCT
CCAAAGGCCAAGGGAGAGACTGCATCCCTTTCCTTCGGTAGCTCTTTTGGTTTCCCAAGGATAGCGGTAGCTGGAGCAAAGCCCGCTTTCTTCTCT
CCGCAAATGAAAGAGAAAGTTAGAGGAAAAAACACATTCTCTCTTTGCGAGATCCAAAAGTGGAGAACGCATAGCATTCTATGGGTACATAGGATC
AAACATAAAGCAGCGCTCTCTTGGCAGAGTTTTAGGTGGCAAGAGACTTTAGGTCTTGTTGGAGCTTCTGAGCGTAACGAATCAAAGTCGAAGATG
GATCAAGGTAGCTTACCTACCAAGCCGATAGGCAAAGTGCTGAAGGATGAAATGTGCAAAGTAGATCGTGCACCTGTCGTGTGA

This goes from: Id_1 to Id_35884, I made a **BlastX** with the protein database of C. annuum var. Zunla-1, because I want to align this CDS file with fastq files using **kallisto**, but when I did the index in **kallisto**, I have an error because some Id are repeated and are not unique (only in the name), because some Id_xxxxx were matched with a gene more that once (I keep the best match in **Blastx**), so, what is the correct way to "annotate" this CDS file?

This is how looks with the Id's from **BlastX**:

>YP_009049799.1
atggcttcaaacaagcgagaaagtccctttctatcgtcattagtcaagcgcgctagctgc
aataaaaaaagagcgctaacgagcaagaaaagggatgtgctaagaagcaagggctttcgc
gcagctgctgcgcccttgattcttgctttcgacctggagcttgatggggttggtgcttgc
aaaaatatcaagtcgacggggtcaggtaccagtagtgacaatagcaaagaggggttggac
actagttgtgtgagtggaatggcccaactggacctagtcagcccgaactattttgcggtt
ctagaggaacctgaagaagaagaggtaaagatgccagatctggacactgctgaaccgaaa
gagattgctcaggatgagtgtttgggtaacaaagccgaggagggtctattcaaggagaga
actcccaaggagagtgatttggctcatagaagcgaggatctagaagaaagggtcaactat
ggaagtgactga
>YP_006666039.1
atgggcagtcttggtcctattgaaaataccagtgaagatccaaatcaaaaagtgaaaaac
attcccagttgtagtaatgttgattatttattcgacgttaaagacattcagaatttcatc
tctgatgacacttttgtagttagtgataggaatggagacagttattccatctattttgat
attgaaaatcagatttttgagattgacaacgatcattcttttctgagtgaactagaaagt
tctttttatagttatcgaaactcgagttatctgaataatggatttaggggcgaagatccc
tactataattcttacatgtataatactcaatatagtttgaataatcacattaatagttgt
attgataataacttcagtctcaaatctgtatag
>YP_009049789.1
atgatactttccgttttgtcgagccctgctttggtctctggtttcatggttgtacgtgca
aaaaatctaatacattccattttgtttctcatcccagtctttcgcaacacttcaggttta
cttcttttgttaggtctcgacttctttgctatgatcttcccagtagtttatataggagct
atagccatttcatttctattcattgttatgattttccatattcaaatagcggagattcac
aaagaagtattgcgctatttactagtgagtggcattattagacttatcttttggttggag
atattctttattttagataatgaaagcattccattactaccaacccaaagaaatacgacc
RNA-Seq rna-seq cds genome • 778 views
ADD COMMENT
0
Entering edit mode
4.7 years ago
joe555 • 0

Hi, would you be so kind to update us briefly if you get ahead? I just started working on a Capsicum annuum transcriptome project. (At first, I downloaded the reference genome and annotation files from NCBI /Zunla-1 v.1.0/. Now I downloaded v.2.0 from BGI.)

ADD COMMENT

Login before adding your answer.

Traffic: 1379 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6