Hi,
I have recently used bedtools flank in combination with getfasta to extract sequences flanking some structural variants, using a VCF file and genome file.
Link to the VCF file: ftp://ftp.solgenomics.net/genomes/tomato100/March_02_2020_sv_landscape/variants/LYC1969.ont.v1.0.s.vcf.gz
Link for the SL4.0 genome fasta: ftp://ftp.solgenomics.net/genomes/Solanumlycopersicum/assembly/build4.00/
For the first structural variant (ID = 261_0_1), 20bp flanking sequences were extracted:
Info from VCF file: POS ID REF ALT 19623 261_0_1 ATATATATATATATATATATATATATATATATATA A
Output from bedtools flank: SL4.0ch01 19602 19622
SL4.0ch01 19658 19678
Output from bedtools get fasta:
SL4.0ch01:19602-19622 GAATGTATTCATATATATAT
SL4.0ch01:19658-19678 TAAAATTCTAACTTGAGAAA
I was wondering if somehow the extracted flanking sequences could be combined with the REF and ALT alleles from the VCF, i.e. using a tool, in the following output:
261_0_1 GAATGTATTCATATATATAT[ATATATATATATATATATATATATATATATATATA/A]TAAAATTCTAACTTGAGAAA
Of course, for just a single structural variant I could do this manually. However, my intention is to do this with some thousands of structural variants in combination with multiple VCF files.
Many thanks!