Question: Modify reference fasta using bed file
0
gravatar for Rubal
6 weeks ago by
Rubal240
Germany
Rubal240 wrote:

Hello Everyone,

I would like to modify a reference genome fasta file using a list of position in bed file format. The bed file contains positions like this:

chr1    17716   C       G
chr1    17925   A       G
chr1    18115   C       T

So for example I would like to change the position at chr1 17716 from C to G in the fasta file. I have seen some python packages and a GATK tool for modifying a fasta file using a list of VCF positions but was wondering if there was any tool that would do this using a bed file as input rather than a VCF. Or would it be better to convert the bed file to VCF format by adding a VCF header (I'm not sure if this would work?).

Thanks in advance for your help.

genome bed fasta • 91 views
ADD COMMENTlink modified 6 weeks ago by SMK1.8k • written 6 weeks ago by Rubal240
1
gravatar for SMK
6 weeks ago by
SMK1.8k
SMK1.8k wrote:

If SNPs only:

$ cat example.fa
>chr1
CCCCCC
>chr2
AAAAAA
$ cat example.bed
chr1    2   C   G
chr1    4   C   G
chr1    6   C   G
chr2    1   A   T
chr2    3   A   T
chr2    5   A   T

$ seqkit fx2tab example.fa > example.tab
$ awk 'FILENAME=="example.tab" {fa[$1]=$2; next} {fa[$1]=substr(fa[$1], 1, $2-1) $4 substr(fa[$1], $2+1, length(fa[$1])-$2)} END {for (id in fa){print ">" id "\n" fa[id]}}' example.tab example.bed
>chr1
CGCGCG
>chr2
TATATA
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by SMK1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1120 users visited in the last hour