Removing white space from the beginning of the second field (sequence) in a fasta file
2
0
Entering edit mode
4.6 years ago
Angie11 • 0

Hello,

Does anyone know of a command (in the linux command line) such as sed to remove white spaces from a specific field only? In my case, I have 2 tab-seperated fields in the format shown below and I would like to remove the white space from the beginning of the second field (the beginning of the sequence) without removing white spaces from the first field. It is a fasta format but I can convert it into a tab delimited text file if needed.

>10_GL0000024 root|cellular organisms|Bacteria|Firmicutes locus=scaffold18562_3:3421:4365:- [Complete]
 MELTFQTATPAERLYTTGQSMQIEGQMGYIGCLQTGMSEDGKGAFPKWSSGREGLNTEEFQQELAGVMDALIHDEQYGGFLKDSDAMRDFCQTHPESGFNNGFAFGFRADTAQYSYLIRLNPCKGEENLSICCYRRDWLDSHMKHAEKGIRFITPHYKEKFRIADGDKVRIRRFDGQVFDRVCRYIDDCHVEIGSELYHICQFAEIMERNGNSVIPLRSSLPFVCYGKVPEKRAIVMFERGFDGYRSASFATKGRTSQKLVDELNGELGVTKAQAAAMQGGATQGWASPAADPKNYDEQGQPIKPRHRDRGDAR

Thank you! Angie

sequence protein fasta linux • 2.1k views
ADD COMMENT
0
Entering edit mode
4.6 years ago
bari.ballew ▴ 460

Take a look at regex anchors, which tie your pattern to the beginning or end of a line. You can use sed to remove whitespace at the beginning of a line only using "^", which anchors the pattern to the beginning of the line.

sed 's/^[\t ]*//' file.fa > secondFile.fa
ADD COMMENT
0
Entering edit mode
4.6 years ago

if this is a fasta file, then you'll always want to remove leading whitespace, so something like:

sed 's/^\s//' myfile.fa

oughta work fine

ADD COMMENT

Login before adding your answer.

Traffic: 2417 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6