Selecting the first 100 nt from sequences
2
0
Entering edit mode
3.7 years ago
far.zi ▴ 10

Hi,

I have fasta file containing loci of like 500 introns. I don't know how to have just the first 100 bases using awk command lines. I have the following command that I used to pick the last 100 nt of the sequences. I thought it might help: sed -Ee 's/^.*(.{100})$/\1/' file.fasta

Thanks, Farid

rna-seq • 1.1k views
ADD COMMENT
1
Entering edit mode

did you try sed -E '/>/! s/^(.{100}).*/\1/'? or you can use seqkit (seqkit subseq -r 1:100). With awk: awk -v OFS="\n" '{getline seq} {print $0, substr(seq,1,100)}'

ADD REPLY
2
Entering edit mode
3.7 years ago
KH ▴ 80

I found this post that should be of use to you:

https://ro-che.info/articles/2016-08-23-fasta-first-n-sequences

ADD COMMENT
0
Entering edit mode

Thanks for your help. But the file it produced is empty :(

ADD REPLY
0
Entering edit mode

That link had nothing at all to do with your problem. Are you really unwilling to even look at what people are giving you?

ADD REPLY
0
Entering edit mode

I don't understand why you are mad at me. I have this question and asked people if they can help me. And so far, from my side, I see only 2 people replied. I don't understand "even look at what people are giving you?".

ADD REPLY
1
Entering edit mode

swbarnes2 is not angry/mad with you and OP is pointing out that you are not applying the answer you already have.

ADD REPLY
0
Entering edit mode
3.7 years ago

I thought it might help: sed -Ee 's/^.*(.{100})$/\1/' file.fasta

Anyone who understands the command above can figure this out for themselves.

So start there. Find a sed tutorial and learn how that command works. Then you can make your own. It's likely faster than waiting for generous strangers to spoonfeed you what you want to know.

ADD COMMENT
0
Entering edit mode

Thanks for your suggestion.

ADD REPLY

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6