Question: Problem with FASTA filtering
0
gravatar for Janey
15 months ago by
Janey30
USA
Janey30 wrote:

Hi

I already used the following command to extract sequence from fasta file by IDs list.

cut -c 2- ID.text | xargs -n 1 samtools faidx in.fasta > out.fasta

but now i get this error:

xargs: samtools: No such file or directory

I tried to use the seqkit, but seqkit is not worked in my unix system.

I used following commands:

perl -e 'open(F,"File1.txt");while(<F>){/(\S+)/; $k{$1}++}; while(<>){if(/>\s*(\S+?)(\.| )/){if($k{$1}){$k=1}else{$k=0}; } print if $k==1;}' File2.fa
awk -F'[ .]' 'NR==FNR{a[$0]; next}/^>/{p=$2 in a}p' file1 file2
grep -x -F -A 1 -f 'File 2' 'File 1'
cat IDs.txt | awk '{gsub("_","\\_",$0);$0="(?s)^>"$0".*?(?=\\n(\\z|>))"}1' | pcregrep -oM -f - f1.fasta
alias FASTAgrep="awk '{gsub(\"_\",\"\\\_\",\$0);\$0=\"(?s)^>\"\$0\".*?(?=\\\n(\\\z|>))\"}1' | pcregrep -oM -f -"
cat IDs.txt | FASTAgrep f1.fasta

and ........

But in all of these cases, the output file was either empty or contains a full input file. So confused please help me.

rna-seq • 518 views
ADD COMMENTlink modified 15 months ago by RamRS24k • written 15 months ago by Janey30

samtools: No such file or directory

install samtools and/or set your PATH https://stackoverflow.com/questions/14637979/

ADD REPLYlink written 15 months ago by Pierre Lindenbaum123k

Do you suggest to download and install the samtools again?

ADD REPLYlink written 15 months ago by Janey30

if samtools is installed so

samtools: No such file or directory

means that it's not in your PATH. you'll need to update your $PATH variable

ADD REPLYlink written 15 months ago by Pierre Lindenbaum123k

samtools was installed Successfully. and this command "cut -c 2- ID.text | xargs -n 1 samtools faidx in.fasta > out.fasta" was worked. my unis system was changed.

ADD REPLYlink written 15 months ago by Janey30

my unis system was changed.

Assuming you mean here "the computer system of my university was changed" then indeed it looks like you need to reinstall samtools. We don't know what has changed on your system though since you don't provide a lot of information.

ADD REPLYlink written 15 months ago by WouterDeCoster40k

I used following commands:

what is this supposed to do ?

ADD REPLYlink written 15 months ago by Pierre Lindenbaum123k

I wanted to use other methods (commands) to extract sequences from fasta file by IDs list. In addition, I must say that I am a biologist and so I'm not familiar with the programming language. Thanks for helping me with more details about commands.

ADD REPLYlink written 15 months ago by Janey30

@OP: I think you tried too many. please post few entries from list and matching records from fasta. There are easy and established codes (methods) to do whatever you are trying. It is as simple as seqtk subseq <input.fa> <ids.list>.

output:

$ seqtk subseq test.fa ids.txt 
>a
atgc
>c
agtc

input:

$ cat test.fa 
>a
atgc
>b
cggat
>c
agtc

$ cat ids.txt 
a
c
ADD REPLYlink modified 15 months ago • written 15 months ago by cpad011212k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1455 users visited in the last hour