How to retrieve sequences of genes list at a time from a fasta file
1
0
Entering edit mode
3.6 years ago
evafinegan • 0

Hi,

I have a protein fasta file and a gene list file. I want to retrieve sequences of all genes in gene list file at one time to save time. I am using this command line:

for i in `cat  gene_list.txt` ; do grep -A1 "$i"  protein.fasta ; done

However, it only gives sequence for the last gene in the gene_list.txt file. I want to retrieve sequences of all the gene. Thank you for the help!

next-gen • 1.5k views
ADD COMMENT
0
Entering edit mode

This can't be a genome file since you would not be able to get individual gene sequences from it.

Best option is to use faSomeRecords utility. See: C: How do I extract Fasta Sequences based on a list of IDs?

ADD REPLY
0
Entering edit mode

By genome I meant protein. Sorry for the confusion. I have edited it in the question.

ADD REPLY
0
Entering edit mode
3.6 years ago
for i in `cat  gene_list.txt` ; do grep -A1 "$i"  protein.fasta

hum, better:

grep -A1 -F -w -f gene_list.txt protein.fasta

However, it only gives sequence for the last gene in the gene_list.txt file

are you sure these are text file and not a windoz-thing ? What is the output of file gene_lists.txt protein.fasta ?

ADD COMMENT
0
Entering edit mode

Hi, thank you! This code also gives sequence only for the last gene ID in the list. Output of file gene_lists.txt protein.fasta is:

gene_lists.txt:        ASCII text, with CRLF line terminators
protein.fasta: ASCII text, with very long lines
ADD REPLY
1
Entering edit mode

Convert the CRFL line terminators with Unix one (LF). The simplest way is with dos2unix or with a text editor (not suggested for huge text file).

ADD REPLY
0
Entering edit mode

with CRLF line terminators

this is your problem. https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats

ADD REPLY
0
Entering edit mode

It worked even with my previous code after converting using dos2unix. Thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 1524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6