Match a column to another file (grep/awk)
1
0
Entering edit mode
2.1 years ago
Nathan ▴ 10

Hello, everyone. I would like a help in a simple issue that I am not being abble to solve.

I want to get a nucleotide sequence from the second column of a text file and match with a fasta file to know the headers which correspond to these sequences. I would also like to modify the header and acording to the first column of the text file and generate a new fasta file, as demonstrated below.

Text file:

1        AACTGA
1        AACTGC
2        CCAGAT
3        GGATCA
3        GGATCC

Original fasta file:

>Sample 1
AACTGA
>Sample 2
CCAGAT
>Sample 3
AACTGA
>Sample 4
CCAGAT
>Sample 5
GGATCA
>Sample 6
GGATCC
>Sample 7
GGATCA
>Sample 8
GGATCC
>Sample 9
AACTGC
>Sample 10
AACTGC

Expected output:

>1|Sample 1
AACTGA
>1|Sample 3
AACTGA
>1|Sample 9
AACTGC
>1|Sample 10
AACTGC
>2|Sample 4
CCAGAT
>2|Sample 2
CCAGAT
>3|Sample 5
GGATCA
>3|Sample 7
GGATCA
>3|Sample 6
GGATCC
>3|Sample 8
GGATCC

I am still a beginner in bioinformatics and simple things are still a challenge for me. Thank you for the help!

grep awk • 520 views
ADD COMMENT
0
Entering edit mode
2.1 years ago

I suppose, this is a class exercise?

In that case, the first step is always to decide on a particular strategy and how to approach this task (derive the algorithm). All of this happens before you actually write the first line of code. Break down your task into single steps - this is what you need to practice, not writing code.

If you have done that and also show this effort in your question, the people on here will be happy to help you with whatever issue you might encounter while implementing it.

Assuming that your files are named textfile.txt and fasta.fa, this will work...but why you will still need to figure out!

paste - - < fasta.fa > temp
awk -F "\t" 'FNR==NR{a[$2]=$1;next}{print ">"a[$2]"|"substr($1,2)"\n"$2}' textfile.txt temp > output.fa
rm temp

Can you tell me the algorithm that I used?

ADD COMMENT

Login before adding your answer.

Traffic: 3274 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6