Question: Parse file to remove two characters from the ends of a lot of lines
0
gravatar for michaela_boell
2.5 years ago by
michaela_boell70 wrote:

Dear all,

I have a txt file with data of this form:

@HWI-ST999:188:C49E6ACXX:8:1101:11404:1998/1
NCGAGGGATGGGAGACCTGGTTGGAAATCCGTGGCTGTTTGGTTGGGGGAT
+
#4=DDFFDHDHHGJIIJIJIFHIIIJJIJJJDHIGIEDGHGI=CGHJJH9>
@HWI-ST999:188:C49E6ACXX:8:1101:1754:2212/1
TCGAATGCATGATAACAATAACCCTGGAACAGGCAACCGTTGTCCCTGACC
+
CCCFFFFFHHHGHJJJJJJJJJJJJJJJJJJJJIJJJJJIIJJJJJJJJJJ

I would like to remove the /1 of the end of every 5th line. Is this possible with a one liner in bash, maybe with sed (OSX)?

Context: I extracted reads from a bam file with Bam2Fastq into the format fastq. But the subsequent processing does not cope with the /1 or /2 in my two files of paired-end reads.

osx command line • 905 views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by michaela_boell70
1

Would be more accurate to call this a fastq file, it's not just a txt file...

ADD REPLYlink written 2.5 years ago by WouterDeCoster39k

But someone who does not know what a fastq format is would be put off by it. And basically it can be called a txt file.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by michaela_boell70

That doesn't make sense. You don't want answers from people who don't know what a fastq file is. This is biostars. We read fastq files at breakfast like normal people read the newspaper.

ADD REPLYlink written 2.5 years ago by WouterDeCoster39k

For this question it is unnecessary to know anything about biology. And I want to keep the question as easy to understand as possible. It would be more accurate to call it a fastq file, I guess, but not helpful here and in some cases maybe distracting. Although I guess 99.9% of people here know that it is a fastq file. And for those, I don't need to make it clear anyways.

ADD REPLYlink written 2.5 years ago by michaela_boell70
2
gravatar for george.ry
2.5 years ago by
george.ry1.1k
United Kingdom
george.ry1.1k wrote:

New file: sed '1~4s/\/1$//' myfile.fq > mynewfile.fq

Inplace: sed -i '1~4s/\/1$//' myfile.fq

ADD COMMENTlink written 2.5 years ago by george.ry1.1k

returns: sed: 1: "1~4s/\/1$//": invalid command code ~

ADD REPLYlink written 2.5 years ago by michaela_boell70

Am I right in guessing that you're on a Mac, then?

// EDIT // Answering myself, the answer is that you are. OSX doesn't have GNU sed, so you'll need to install it with homebrew (etc).

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by george.ry1.1k
$ brew list
boost       hdf5        libxml2     sratoolkit  tophat
bowtie2     htslib      openssl     szip        wget
gnu-sed     libmagic    samtools    tbb

I installed gnu-sed with homebrew. Now it returns: sed: 1: "s_188_1_seq.txt": bad flag in substitute command: 's'

EDIT: homebrew put it in a weird location, so I used it like this now:

/usr/local/Cellar/gnu-sed/4.2.2/bin/gsed -i '1~4s/\/1$//' s_188_1_seq.txt

And it worked! Thank you. <3

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by michaela_boell70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1527 users visited in the last hour