Question: Replace the bases with N for the sites with evidence of a SNP
0
gravatar for Prakki Rama
16 months ago by
Prakki Rama2.4k
Singapore
Prakki Rama2.4k wrote:

Hi all,

I have a alignment fasta file which looks like this:

$ cat alignment_file.fasta
>Ref1
ATCG
>S1
AT-C
>S2
CTCG
>S3
ATCG
>S4
-TCG

Specifically, what I want is:

1) If there exists a SNP (for eg: "C" in the S2 sequence in the position 1), then I want the gap ("-", S4 sequence in the position 1) to be replaced with "N"

2) If there is a gap in the column but no SNPs (like in the S1 sequence, position 3), then I want to replace the gap with reference base in the position 3 (which is "C" in the reference).

I want to change the above output to this:

>Ref1
ATCG
>S1
ATCC
>S2
CTCG
>S3
ATCG
>S4
NTCG

Is there any easy way or any tool with which I can do this? Thanks so much!

snp alignment • 442 views
ADD COMMENTlink modified 16 months ago by chantelmarshal3210 • written 16 months ago by Prakki Rama2.4k

I think you can use bedtools maskfasta, it will do the job.

ADD REPLYlink written 16 months ago by abedkurdi1030

Hi @abedkurdi10. Thank you. I understand bedtools maskfasta can mask the base to N's but I am looking for some tool or script which can replace N only when there is a SNP in the whole column as mentioned in my question.

ADD REPLYlink modified 16 months ago • written 16 months ago by Prakki Rama2.4k

I think 'pattern matching' in any programming language can also solve this issue.

ADD REPLYlink written 16 months ago by Pramod Shinde0

Hi @Pramod. Thank you. But as mentioned in the question, I am looking to replace Ns if only a SNP is present in the whole column and it is slightly beyond just matching a pattern and replacing it. I might end up writing my own script for this. Just did not want to reinvent the wheel. Anyways, thanks for response.

ADD REPLYlink modified 16 months ago • written 16 months ago by Prakki Rama2.4k
2
gravatar for Prakki Rama
16 months ago by
Prakki Rama2.4k
Singapore
Prakki Rama2.4k wrote:

Answering the question myself.

I have written a perl script which can actually do this. It can be found in github for reference.

ADD COMMENTlink written 16 months ago by Prakki Rama2.4k

bedtools maskfasta is already available, why writing a script from scratch?

ADD REPLYlink written 16 months ago by abedkurdi1030

Hi. As I have stated in my question, there are if conditions which I need to fulfill before I replace the gaps with either N or Reference base. Simply using bedtools maskfasta might not accomplish what I need. I would be glad to accept and vote your answer if you can actually post your steps to solve the above problem using bedtools. Thanks much!

ADD REPLYlink modified 16 months ago • written 16 months ago by Prakki Rama2.4k

bedtools maskfasta requires FASTA as input file. To mask regions in that FASTA file, bedtools maskfasta accepts BED, GFF and VCF files as regions to be masked. So SNPs in a VCF file will be masked in the FASTA file.

Also the tool is easy to use: bedtools maskfasta -fi input.fasta -fo output.fasta -bed regions.vcf

ADD REPLYlink written 16 months ago by abedkurdi1030

How about replacing replaceAmbigousBases to Reference? replaceAmbigousBases2Reference.pl

ADD REPLYlink written 6 months ago by Shicheng Guo8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1433 users visited in the last hour