Question

How to extract SNPs from multiple alignment fasta file?

0

Entering edit mode

5.5 years ago

pltbiotech_tkarthi ▴ 180

I am using following scripts to read fasta files

>library(Biostrings)
>dna <- readDNAStringSet("<<PATH TO FASTA FILE>>")

But, further I would like to extract SNPs from these alignment file, but I don't know how to extract the SNPs.

Does anyone know?

snp SNP alignment sequencing • 4.9k views

ADD COMMENT • link 5.5 years ago by pltbiotech_tkarthi ▴ 180

1

Entering edit mode

With adegenet package in R, fasta2DNAbin("text.fasta", snpOnly = T)

ADD REPLY • link 5.5 years ago by Myo Naung ▴ 10

0

Entering edit mode

Thanks, I will try it

ADD REPLY • link 5.5 years ago by pltbiotech_tkarthi ▴ 180

0

Entering edit mode

Hello Naung.M,

I used the script as you suggested from adegenet library

fasta2DNAbin("Path to fasta", snpOnly = T)

I found the following result: I have 109 sequences and each approximately 1169 length

Converting FASTA alignment into a DNAbin object...

Finding the size of a single genome...

genome size is: 1,169 nucleotides

( 60 lines per genome )

Importing sequences... .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Forming final object...

Extracting SNPs...

...done.

109 DNA sequences in binary format stored in a matrix.

All sequences of same length: 1058

Labels: Seq1 Seq2 Seq3 ...

Base composition: a c g t 0.284 0.202 0.253 0.261

(Total: 115.32 kb)

It's giving only above information, but I would like to extract the SNPs.

Then, I used

myPath <- system.file("path to fasta",package="adegenet") myPath [1] " "

I read the file as below

obj <- fasta2DNAbin(myPath, chunk=109) Error in if (!ext %in% c("FASTA", "FA", "FAS")) warning("wrong file extension - '.fasta', '.fa' or '.fas' expected") : argument is of length zero

Showing this error.

Any one, please suggest me how to resolve this to extract SNPs from multiple fasta aligned file.

ADD REPLY • link 5.5 years ago by pltbiotech_tkarthi ▴ 180

0

Entering edit mode

You can convert DNAbin object into csv files by the following script: write.csv(DNAbin, "filename.csv"). I guess allele will be coded in number format for each SNPs position.

ADD REPLY • link 5.5 years ago by Myo Naung ▴ 10

score 2 · Answer 1 · 2018-10-09

2

Entering edit mode

5.5 years ago

Pierre Lindenbaum 161k

see https://github.com/sanger-pathogens/snp_sites

ADD COMMENT • link 5.5 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks, I can try it.

ADD REPLY • link 5.5 years ago by pltbiotech_tkarthi ▴ 180

0

Entering edit mode

Hi Peirre, Thanks, I installed SNP-sites in Linux-Ubuntu and I extracted the SNPs, but seems I couldn't findout the SNPs position in comparison with the first sequence, which is reference sequence. I used the command: vagrant@ubuntu-xenial:/vagrant$ snp-sites test10.fasta . Could you please let me know, if there is a specific script/command for retrieving the SNPs position in comparison with the reference sequence. Also, could you please let me know, how I can install the Jvarkit (https://omictools.com/jvarkit-tool) in Linux Ubuntu? Is there specific script for the installation of Jvarkit?