Question

single base pair substitution at a specific position on a reference genome fasta file

0

Entering edit mode

21 months ago

kng ▴ 30

I have a specific location in the human genome where I would like to substitute a base pair and save the new file with substituted base. How can I do this? For example. I would like a base pair substitution at position 31774114 of chromosome x from T-C from GRCh 38 genome fasta file.

singlebasesubstitution UNIX awk iranges fasta • 715 views

ADD COMMENT • link 21 months ago by kng ▴ 30

score 2 · Accepted Answer · 2022-07-27

2

Entering edit mode

21 months ago

Pierre Lindenbaum 161k

echo ">chrX" && ( samtools faidx  GRCh38.fasta "chrX:1-31774113" | tail -n +2 && echo "C"  &&   samtools faidx GRCh38.fasta "chrX:31774115-156040895" | tail -n +2) | tr -d '\n' | fold -w 60

ADD COMMENT • link 21 months ago by Pierre Lindenbaum 161k

score 2 · Accepted Answer · 2022-07-27

seqkit mutate:

seqkit mutate -s X -p 31774114:C     GRCh38.fasta.gz -o edit.fasta.gz

You may check the chromosome ID first, to make sure it's X or chrX:

$ seqkit seq -n Homo_sapiens.GRCh38.dna_rm.primary_assembly.fa.gz | grep -i X
X dna_rm:chromosome chromosome:GRCh38:X:1:156040895:1 REF

Example:

$ echo -ne ">chr1\naaaa\n>chrX\naaaa\n"
>chr1
aaaa
>chrX
aaaa

$ echo -ne ">chr1\naaaa\n>chrX\naaaa\n" | seqkit mutate -s chrX -p 3:C
[INFO] edit seq: chrX
>chr1
aaaa
>chrX
aaCa