I have a long list of different miRNAs like below
hsa-miR-7641 bta-miR-2904 hhi-miR-7641 hsa-miR-4454 bta-miR-2478 efu-let-7c mmu-miR-6240 hsa-miR-7704
is there any way to avoid searching one by one in mirbase and copy pasting their mature sequences and taking their sequences at the same time?
grep -f <youridentifiers.txt> <mirbase.fasta> > result.txt
You need to redirect the output to a file, e.g.
grep -f youridentifiers.txt mirbase.fasta > mysequences.fasta. Without that redirection, grep just writes to stdout, which is the terminal.
You write "As a txt file", what exactly do you have in mind as output? With standard linux tools (grep, sed, cut, awk) you can probably do all processing you would like.
Oh good, I see you already found out how to get it in a file ;)
yes exceptionally sometimes grey cells in my brain work :)
sorry I inspect the result file that contained all of mature miRNAs while my identifiers.txt contained about 80 miRNAs
then what to do?
Is there by chance a newline (empty line) at the bottom of your identifiers.txt file? Additionally, might be better to go for
grep -w -f identifiers.txt mirbase.fastato only get full 'word' matches and not partial 'pattern' matches.
exactly the problem was some empty lines at the end of my id file
Yeah, and then everything matches to the newlines :p (I have this problem about 95% of the times that I do something like this)
I tried these codes all give me whole of fasta file
grep -A1 -w -f id.txt seqFile.fasta > output.fasta
grep -Fwf miRNA.txt -A 1 mature.fasta | grep -v '^--$' > out.fasta
grep -Fwf < ( sort -u miRNA.txt ) -A 1 mature.fasta | grep -v '^--$'
Could you try
grep -A 1 -Fwf <(tr -d '\n' miRNA.txt) mature.fasta > result.fasta? I see you do grep -v at the end to remove newlines, but you should remove newlines from the identifier file (although I have my doubt whether
grep -A 1 -Fwf <(tr -d '\n' miRNA.txt) mature.fasta > result.fasta
told missing name for redirect
supposing a file containing
miRNA : hsa-miR-26b-3p
mfe: -18.2 kcal/mol
target 5' A UACCU AAA A A 3'
miRNA 3' C UUCAUU 5'
miRNA : hsa-miR-26b-3p
mfe: -23.2 kcal/mol
target 5' G CA GGA AAAGA U 3'
miRNA 3' A C 5
how to extract only unique target for example EAL01688
Which output do you want? I'm not sure I understand.
actually I was searching for targets of my miRNAs by rnahybrid as I presented some parts of the results above. now I want to extract only target from this output specially unique target. for example if EAL01691 repeated some times I only need that once.
If I understand it correctly, something like this would work:
grep "target:" yourinput.txt | sed 's/"target: //' | sort -u