Question: Replace the bases with N for the sites with evidence of a SNP
0
gravatar for Prakki Rama
8 months ago by
Prakki Rama2.4k
Singapore
Prakki Rama2.4k wrote:

Hi all,

I have a alignment fasta file which looks like this:

$ cat alignment_file.fasta
>Ref1
ATCG
>S1
AT-C
>S2
CTCG
>S3
ATCG
>S4
-TCG

Specifically, what I want is:

1) If there exists a SNP (for eg: "C" in the S2 sequence in the position 1), then I want the gap ("-", S4 sequence in the position 1) to be replaced with "N"

2) If there is a gap in the column but no SNPs (like in the S1 sequence, position 3), then I want to replace the gap with reference base in the position 3 (which is "C" in the reference).

I want to change the above output to this:

>Ref1
ATCG
>S1
ATCC
>S2
CTCG
>S3
ATCG
>S4
NTCG

Is there any easy way or any tool with which I can do this? Thanks so much!

snp alignment • 260 views
ADD COMMENTlink modified 8 months ago by chantelmarshal3210 • written 8 months ago by Prakki Rama2.4k

I think you can use bedtools maskfasta, it will do the job.

ADD REPLYlink written 8 months ago by abedkurdi1020

Hi @abedkurdi10. Thank you. I understand bedtools maskfasta can mask the base to N's but I am looking for some tool or script which can replace N only when there is a SNP in the whole column as mentioned in my question.

ADD REPLYlink modified 8 months ago • written 8 months ago by Prakki Rama2.4k

I think 'pattern matching' in any programming language can also solve this issue.

ADD REPLYlink written 8 months ago by Pramod0

Hi @Pramod. Thank you. But as mentioned in the question, I am looking to replace Ns if only a SNP is present in the whole column and it is slightly beyond just matching a pattern and replacing it. I might end up writing my own script for this. Just did not want to reinvent the wheel. Anyways, thanks for response.

ADD REPLYlink modified 8 months ago • written 8 months ago by Prakki Rama2.4k
2
gravatar for Prakki Rama
8 months ago by
Prakki Rama2.4k
Singapore
Prakki Rama2.4k wrote:

Answering the question myself.

I have written a perl script which can actually do this. It can be found in github for reference.

ADD COMMENTlink written 8 months ago by Prakki Rama2.4k

bedtools maskfasta is already available, why writing a script from scratch?

ADD REPLYlink written 8 months ago by abedkurdi1020

Hi. As I have stated in my question, there are if conditions which I need to fulfill before I replace the gaps with either N or Reference base. Simply using bedtools maskfasta might not accomplish what I need. I would be glad to accept and vote your answer if you can actually post your steps to solve the above problem using bedtools. Thanks much!

ADD REPLYlink modified 8 months ago • written 8 months ago by Prakki Rama2.4k

bedtools maskfasta requires FASTA as input file. To mask regions in that FASTA file, bedtools maskfasta accepts BED, GFF and VCF files as regions to be masked. So SNPs in a VCF file will be masked in the FASTA file.

Also the tool is easy to use: bedtools maskfasta -fi input.fasta -fo output.fasta -bed regions.vcf

ADD REPLYlink written 8 months ago by abedkurdi1020
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1845 users visited in the last hour