hi,
I have a long list of different miRNAs like below
hsa-miR-7641
bta-miR-2904
hhi-miR-7641
hsa-miR-4454
bta-miR-2478
efu-let-7c
mmu-miR-6240
hsa-miR-7704
is there any way to avoid searching one by one in mirbase and copy pasting their mature sequences and taking their sequences at the same time?
thank you,
grep -f <youridentifiers.txt> <mirbase.fasta> > result.txt
You need to redirect the output to a file, e.g.
grep -f youridentifiers.txt mirbase.fasta > mysequences.fasta
. Without that redirection, grep just writes to stdout, which is the terminal.You write "As a txt file", what exactly do you have in mind as output? With standard linux tools (grep, sed, cut, awk) you can probably do all processing you would like.
Oh good, I see you already found out how to get it in a file ;)
yes exceptionally sometimes grey cells in my brain work :)
sorry I inspect the result file that contained all of mature miRNAs while my identifiers.txt contained about 80 miRNAs
:(
then what to do?
Is there by chance a newline (empty line) at the bottom of your identifiers.txt file? Additionally, might be better to go for
grep -w -f identifiers.txt mirbase.fasta
to only get full 'word' matches and not partial 'pattern' matches.thank youuuuuu
exactly the problem was some empty lines at the end of my id file
Yeah, and then everything matches to the newlines :p (I have this problem about 95% of the times that I do something like this)
I tried these codes all give me whole of fasta file
grep -A1 -w -f id.txt seqFile.fasta > output.fasta
grep -Fwf miRNA.txt -A 1 mature.fasta | grep -v '^--$' > out.fasta
grep -Fwf < ( sort -u miRNA.txt ) -A 1 mature.fasta | grep -v '^--$'
Could you try
grep -A 1 -Fwf <(tr -d '\n' miRNA.txt) mature.fasta > result.fasta
? I see you do grep -v at the end to remove newlines, but you should remove newlines from the identifier file (although I have my doubt whether^--$
would work).thank you
grep -A 1 -Fwf <(tr -d '\n' miRNA.txt) mature.fasta > result.fasta
told missing name for redirect
sorry,
supposing a file containing
dataset: 1
target: EAL01688
length: 507
miRNA : hsa-miR-26b-3p
length: 22
mfe: -18.2 kcal/mol
p-value: 1.000000e+00
position 190
target 5' A UACCU AAA A A 3'
miRNA 3' C UUCAUU 5'
dataset: 1
target: EAL01691
length: 2484
miRNA : hsa-miR-26b-3p
length: 22
mfe: -23.2 kcal/mol
p-value: 1.000000e+00
position 913
target 5' G CA GGA AAAGA U 3'
miRNA 3' A C 5
how to extract only unique target for example EAL01688
Which output do you want? I'm not sure I understand.
thank you
actually I was searching for targets of my miRNAs by rnahybrid as I presented some parts of the results above. now I want to extract only target from this output specially unique target. for example if EAL01691 repeated some times I only need that once.
If I understand it correctly, something like this would work:
grep "target:" yourinput.txt | sed 's/"target: //' | sort -u