uniq
report or omit repeated lines
see also :
comm - join - sort
Synopsis
uniq
[OPTION]... [INPUT [OUTPUT]]
add an example, a script, a trick and tips
examples
The example, 'sort ~/.bash_history | uniq | grep $1' will fail if $1 contains whitespace. Variable references should be quoted:
sort ~/.bash_history | uniq | grep "$1"
example added by Chris F.A. Johnson
source
sort *_movies.txt|uniq > all.txt
source
cat tags.csv | sort tags.csv | uniq >
uniq_tags.csv
source
Remove duplicates in each line of a file
Since ruby
comes with any Linux distribution I know
of:
ruby -e 'STDIN.readlines.each { |l| l.split(" ").uniq.each { |e| print "#{e} " }; print "\n" }' < test
Here, test
is the file that contains the elements.
To explain what this command does—although Ruby can almost be
read from left to right:
- Read the input (which comes from
< test
through your shell)
- Go through each line of the input
- Split the line based on one space separating the items, into
an array (
split(" ")
)
- Get the unique elements from this array (in-order)
- For each unique element, print it, including a space
(
print "#{e} "
)
- Print a newline once we're done with the unique elements
source
linux sort -n uniq -c
One way using awk
:
awk '!array[$0]++' file.txt
Results:
valA
valB
valC
valZ
Food for thought:
sort -u file.txt
valA
valB
valC
valZ
< file.txt sort | uniq
valA
valB
valC
valZ
< file.txt sort | uniq -u # only print unique lines
valA
valC
valZ
< file.txt sort | uniq -d # only print duplicate lines
valB
source
option -f for linux command
From man uniq
:
A field is a run of blanks (usually spaces and/or TABs), then
non-blank characters. Fields are skipped before chars.
With -f2
you're skipping all your fields, so only
the first line gets output, all the others are equal after the
second field (none have more than two).
description
Filter adjacent
matching lines from INPUT (or standard input), writing to
OUTPUT (or standard output).
With no
options, matching lines are merged to the first
occurrence.
Mandatory
arguments to long options are mandatory for short options
too.
-c, --count
prefix lines by the number of
occurrences
-d,
--repeated
only print duplicate lines
-D,
--all-repeated[=delimit-method]
print all duplicate lines
delimit-method={none(default),prepend,separate}
Delimiting is done with blank lines
-f,
--skip-fields=N
avoid comparing the first N
fields
-i,
--ignore-case
ignore differences in case when
comparing
-s,
--skip-chars=N
avoid comparing the first N
characters
-u,
--unique
only print unique lines
-z,
--zero-terminated
end lines with 0 byte, not
newline
-w,
--check-chars=N
compare no more than N
characters in lines
--help
display this help and exit
--version
output version information and
exit
A field is a
run of blanks (usually spaces and/or TABs), then
non-blank characters. Fields are skipped before
chars.
Note:
’uniq’ does not detect repeated lines unless
they are adjacent. You may want to sort the input first, or
use ’sort -u’ without
’uniq’. Also, comparisons honor the rules
specified by ’LC_COLLATE’.
copyright
Copyright © 2012 Free Software Foundation, Inc. License GPLv3+:
GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute
it. There is NO WARRANTY, to the extent permitted by law.
reporting bugs
Report uniq bugs to bug-coreutils[:at:]gnu[:dot:]org
GNU coreutils home page:
<http://www.gnu.org/software/coreutils/>
General help using GNU software:
<http://www.gnu.org/gethelp/>
Report uniq translation bugs to
<http://translationproject.org/team/>
see also
comm ,
join , sort
The full
documentation for uniq is maintained as a Texinfo
manual. If the info and uniq programs are
properly installed at your site, the command
info
coreutils 'uniq invocation'
should give you
access to the complete manual.
author
Written by
Richard M. Stallman and David MacKenzie.