single base pair substitution at a specific position on a reference genome fasta file
2
0
Entering edit mode
21 months ago
kng ▴ 30

I have a specific location in the human genome where I would like to substitute a base pair and save the new file with substituted base. How can I do this? For example. I would like a base pair substitution at position 31774114 of chromosome x from T-C from GRCh 38 genome fasta file.

singlebasesubstitution UNIX awk iranges fasta • 715 views
ADD COMMENT
2
Entering edit mode
21 months ago
echo ">chrX" && ( samtools faidx  GRCh38.fasta "chrX:1-31774113" | tail -n +2 && echo "C"  &&   samtools faidx GRCh38.fasta "chrX:31774115-156040895" | tail -n +2) | tr -d '\n' | fold -w 60
ADD COMMENT
2
Entering edit mode
21 months ago

seqkit mutate:

seqkit mutate -s X -p 31774114:C     GRCh38.fasta.gz -o edit.fasta.gz

You may check the chromosome ID first, to make sure it's X or chrX:

$ seqkit seq -n Homo_sapiens.GRCh38.dna_rm.primary_assembly.fa.gz | grep -i X
X dna_rm:chromosome chromosome:GRCh38:X:1:156040895:1 REF

Example:

$ echo -ne ">chr1\naaaa\n>chrX\naaaa\n"
>chr1
aaaa
>chrX
aaaa

$ echo -ne ">chr1\naaaa\n>chrX\naaaa\n" | seqkit mutate -s chrX -p 3:C
[INFO] edit seq: chrX
>chr1
aaaa
>chrX
aaCa
ADD COMMENT

Login before adding your answer.

Traffic: 1678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6