Question: Problem with FASTA filtering
0
gravatar for Janey
10 months ago by
Janey20
USA
Janey20 wrote:

Hi

I already used the following command to extract sequence from fasta file by IDs list.

cut -c 2- ID.text | xargs -n 1 samtools faidx in.fasta > out.fasta

but now i get this error:

xargs: samtools: No such file or directory

I tried to use the seqkit, but seqkit is not worked in my unix system.

I used following commands:

perl -e 'open(F,"File1.txt");while(<F>){/(\S+)/; $k{$1}++}; while(<>){if(/>\s*(\S+?)(\.| )/){if($k{$1}){$k=1}else{$k=0}; } print if $k==1;}' File2.fa
awk -F'[ .]' 'NR==FNR{a[$0]; next}/^>/{p=$2 in a}p' file1 file2
grep -x -F -A 1 -f 'File 2' 'File 1'
cat IDs.txt | awk '{gsub("_","\\_",$0);$0="(?s)^>"$0".*?(?=\\n(\\z|>))"}1' | pcregrep -oM -f - f1.fasta
alias FASTAgrep="awk '{gsub(\"_\",\"\\\_\",\$0);\$0=\"(?s)^>\"\$0\".*?(?=\\\n(\\\z|>))\"}1' | pcregrep -oM -f -"
cat IDs.txt | FASTAgrep f1.fasta

and ........

But in all of these cases, the output file was either empty or contains a full input file. So confused please help me.

rna-seq • 421 views
ADD COMMENTlink modified 10 months ago by RamRS21k • written 10 months ago by Janey20

samtools: No such file or directory

install samtools and/or set your PATH https://stackoverflow.com/questions/14637979/

ADD REPLYlink written 10 months ago by Pierre Lindenbaum119k

Do you suggest to download and install the samtools again?

ADD REPLYlink written 10 months ago by Janey20

if samtools is installed so

samtools: No such file or directory

means that it's not in your PATH. you'll need to update your $PATH variable

ADD REPLYlink written 10 months ago by Pierre Lindenbaum119k

samtools was installed Successfully. and this command "cut -c 2- ID.text | xargs -n 1 samtools faidx in.fasta > out.fasta" was worked. my unis system was changed.

ADD REPLYlink written 10 months ago by Janey20

my unis system was changed.

Assuming you mean here "the computer system of my university was changed" then indeed it looks like you need to reinstall samtools. We don't know what has changed on your system though since you don't provide a lot of information.

ADD REPLYlink written 10 months ago by WouterDeCoster38k

I used following commands:

what is this supposed to do ?

ADD REPLYlink written 10 months ago by Pierre Lindenbaum119k

I wanted to use other methods (commands) to extract sequences from fasta file by IDs list. In addition, I must say that I am a biologist and so I'm not familiar with the programming language. Thanks for helping me with more details about commands.

ADD REPLYlink written 10 months ago by Janey20

@OP: I think you tried too many. please post few entries from list and matching records from fasta. There are easy and established codes (methods) to do whatever you are trying. It is as simple as seqtk subseq <input.fa> <ids.list>.

output:

$ seqtk subseq test.fa ids.txt 
>a
atgc
>c
agtc

input:

$ cat test.fa 
>a
atgc
>b
cggat
>c
agtc

$ cat ids.txt 
a
c
ADD REPLYlink modified 10 months ago • written 10 months ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 830 users visited in the last hour