I have an alignment of genome sequencing reads mapped to a reference genome. I created a VCF file from this alignment and filtered it depending on the PHRED score (>20). Now, I want to extract every CDS sequences that are annotated on the reference genome but with the variants present on my individual mapped to this genome.
I have a gff3 file with annotations, the fasta file of the reference genome and a VCF file of my individual variants.
I have seen similar questions on which people were using bedtools getfasta to extract sequences but it only returns sequences exons by exons and it does not concatenate them in a full CDS sequence (This tool seems nice to extract transcripts sequences but not CDS).
Does anyone have an idea how to do it ? Should i first create a whole genome consensus sequence from the alignment and then use a tool that extract CDS sequences using this consensus sequence as reference ? (And which tool can do it properly ?)
Thanks a lot,