Problem with FASTA filtering
0
0
Entering edit mode
5.9 years ago
Janey ▴ 30

Hi

I already used the following command to extract sequence from fasta file by IDs list.

cut -c 2- ID.text | xargs -n 1 samtools faidx in.fasta > out.fasta

but now i get this error:

xargs: samtools: No such file or directory

I tried to use the seqkit, but seqkit is not worked in my unix system.

I used following commands:

perl -e 'open(F,"File1.txt");while(<F>){/(\S+)/; $k{$1}++}; while(<>){if(/>\s*(\S+?)(\.| )/){if($k{$1}){$k=1}else{$k=0}; } print if $k==1;}' File2.fa
awk -F'[ .]' 'NR==FNR{a[$0]; next}/^>/{p=$2 in a}p' file1 file2
grep -x -F -A 1 -f 'File 2' 'File 1'
cat IDs.txt | awk '{gsub("_","\\_",$0);$0="(?s)^>"$0".*?(?=\\n(\\z|>))"}1' | pcregrep -oM -f - f1.fasta
alias FASTAgrep="awk '{gsub(\"_\",\"\\\_\",\$0);\$0=\"(?s)^>\"\$0\".*?(?=\\\n(\\\z|>))\"}1' | pcregrep -oM -f -"
cat IDs.txt | FASTAgrep f1.fasta

and ........

But in all of these cases, the output file was either empty or contains a full input file. So confused please help me.

RNA-Seq • 1.6k views
ADD COMMENT
0
Entering edit mode

samtools: No such file or directory

install samtools and/or set your PATH https://stackoverflow.com/questions/14637979/

ADD REPLY
0
Entering edit mode

Do you suggest to download and install the samtools again?

ADD REPLY
0
Entering edit mode

if samtools is installed so

samtools: No such file or directory

means that it's not in your PATH. you'll need to update your $PATH variable

ADD REPLY
0
Entering edit mode

samtools was installed Successfully. and this command "cut -c 2- ID.text | xargs -n 1 samtools faidx in.fasta > out.fasta" was worked. my unis system was changed.

ADD REPLY
0
Entering edit mode

my unis system was changed.

Assuming you mean here "the computer system of my university was changed" then indeed it looks like you need to reinstall samtools. We don't know what has changed on your system though since you don't provide a lot of information.

ADD REPLY
0
Entering edit mode

I used following commands:

what is this supposed to do ?

ADD REPLY
0
Entering edit mode

I wanted to use other methods (commands) to extract sequences from fasta file by IDs list. In addition, I must say that I am a biologist and so I'm not familiar with the programming language. Thanks for helping me with more details about commands.

ADD REPLY
0
Entering edit mode

@OP: I think you tried too many. please post few entries from list and matching records from fasta. There are easy and established codes (methods) to do whatever you are trying. It is as simple as seqtk subseq <input.fa> <ids.list>.

output:

$ seqtk subseq test.fa ids.txt 
>a
atgc
>c
agtc

input:

$ cat test.fa 
>a
atgc
>b
cggat
>c
agtc

$ cat ids.txt 
a
c
ADD REPLY

Login before adding your answer.

Traffic: 2399 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6