Question

Hunting invisible characters?

0

Entering edit mode

4.2 years ago

geneticatt ▴ 140

Hi all,

I have a set of adapters which were given to me by a collaborator in a regular text file (i5R.txt). I moved these sequences onto my institution's linux HPC and attempted to use the files to pull sequences from a fastq using grep -f like so:

grep -f i5R.txt myseqs.fastq

This returned nothing, which was surprising because I know that the adaptors are there because I can match them in vim. Suspecting some pesky invisible characters, I typed out the characters in vim into a new text file called i5R.seqs. This fixed the pattern matching issue with grep.

Here is the diff of the two files, to show that they appear identical.

[geneticatt]$ diff i5R.txt i5R.seqs
1,8c1,8
< CCTGATAC
< TTAAGTTG
< CGGACAGT
< GCACTACA
< TGGTGCCT
< TCCACGGC
< ATGTCGTG
< CCACGACA
---
> CCTGATAC
> TTAAGTTG
> CGGACAGT
> CGACTACA
> TGGTGCCT
> TCCACGGC
> ATGTCGTG
> CCACGACA

What type of character could be the culprit? I searched for \r because I've had problems with that one before, but this is another invisible character. How does one go about hunting down and removing the invisible characters that plague their workflow? Further, what preventative measures can I take to make sure I don't get hung up on something like this again?

adaptor adapter grep • 1.6k views

ADD COMMENT • link updated 4.2 years ago by seidel 11k • written 4.2 years ago by geneticatt ▴ 140

1

Entering edit mode

You could have looked at the file using cat -vet which would have shown all characters in the file. Printable and non.

ADD REPLY • link 4.2 years ago by GenoMax 152k

1

Entering edit mode

Another way to see hidden characters is to pipe them through octal dump: cat infile | od -c this will print out hidden characters, newlines, etc.

ADD REPLY • link 4.2 years ago by seidel 11k

score 2 · Answer 1 · 2021-04-15

2

Entering edit mode

4.2 years ago

Mensur Dlakic ★ 29k

You may want to read this. I think you may be able to fix your adaper file by typing:

dos2unix i5R.txt

If an error pops up saying that a command doesn't exist, this should work:

sed -i 's/\r//' i5R.txt

ADD COMMENT • link 4.2 years ago by Mensur Dlakic ★ 29k

0

Entering edit mode

Thank you, using dos2unix worked perfectly!

ADD REPLY • link 4.2 years ago by geneticatt ▴ 140