Finding non ATCGN nucleotides in a fasta file?
1
0
Entering edit mode
4.2 years ago
dpearton • 0

Hello,

I have downloaded a genome assembly from genbank (refseq) and it apparently contains some nucleotides that are not either ACTGN (according to the error file from the radinitio program).

I would like to try and find out what these are prior to fixing the file. I've tried various combinations of grep...

grep -i -v [ACTGN]+ sequence.fas

etc., but they either find everything in the file, or nothing.

I would like to do a "simple" grep that finds lines that contains any characters IN ADDITION to [ACTGN] (either case). I can get rid of fasta headers by piping grep -v '>'

Thanks!

sequence genome grep fasta IUPAC • 1.8k views
ADD COMMENT
1
Entering edit mode
4.2 years ago
grep -i -E '[^ACTGN]+'
ADD COMMENT
0
Entering edit mode

Thank you very much for this. I had tried the caret negation but I did not use the -E so it didn't work.

ADD REPLY

Login before adding your answer.

Traffic: 1767 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6