Get middle part of a url
2
0
Entering edit mode
4 months ago
blackadder • 0

Hello there,

I have a file with ftp links that look like this:

> ftp.sra.ebi.ac.uk/vol1/sequence/ERZ914/ERZ914930/contig.fa.gz
> ftp.sra.ebi.ac.uk/vol1/sequence/ERZ928/ERZ928990/contig.fa.gz
> ....

I am reading them one by one with a while loop in bash and what I'm trying to do is to drop everything before and after ERZ914930. So, I only want to keep ERZ914930

I have tried the following:

basename=${line##*/} - This returns contig.fa.gz
base=${line%%/ERZ*} - This returns ftp.sra.ebi.ac.uk/vol1/sequence

with line being the iteration variable

Thanks!

Unix bash • 409 views
ADD COMMENT
3
Entering edit mode
4 months ago
base=`echo "${line}" | cut -d '/' -f 5`
ADD COMMENT
0
Entering edit mode

Hello there!

Your suggestions returns ERZ914

Thank you

ADD REPLY
0
Entering edit mode

yeah --f5...

ADD REPLY
0
Entering edit mode

Op yeah it works now!

Thank you!

ADD REPLY
1
Entering edit mode
4 months ago
Joe 20k

Another solution using regex:

$ [[ "ftp.sra.ebi.ac.uk/vol1/sequence/ERZ928/ERZ928990/contig.fa.gz" =~ ([[:alpha:]]{3}[[:digit:]]{6}) ]] && echo ${BASH_REMATCH[1]}
ERZ928990

You can use this approach to extract any pattern you wish should you want to capture more information or do more complex filtering.

ADD COMMENT

Login before adding your answer.

Traffic: 819 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6