| DIRCONV(1) | General Commands Manual | DIRCONV(1) |
dirconv — locate
and transcode mixed-encoding file names
dirconv |
[-078dFhnpruvw] [-f
charset] [-x
regex] [path ...] |
The dirconv utility recursively scans the
specified path(s) and classifies files and directories according to whether
their names are pure 7-bit ASCII, non-ASCII but valid UTF-8, double-UTF-8
(WTF-8), or neither.
Names in the latter category are assumed to be Latin-1, unless a
different encoding is specified with the -f
option.
By default, the dirconv utility then
prints the names that are neither pure 7-bit ASCII nor valid UTF-8.
The following options are available:
-0-n option was also
specified.-7-8-7,
-u and / or -w options are
specified.-d-F-r option, force renaming
a file when the target already exists.-f
charset-h-n-r option, show what would
have happened, but do not actually rename any files.-p-r-u-v-w-x
regexThe dirconv utility and this manual page
were written by Dag-Erling Smørgrav
⟨des@des.no⟩ for the University of Oslo.
The dirconv utility works by attempting to
decode each name as if it were a sequence of UTF-8 characters. It is
possible, but highly unlikely, that a random string of characters in a
non-UTF single-byte encoding would look like a valid UTF-8 sequence.
Reliable detection of WTF-8 is only possible if the original 8-bit encoding is known.
The exclusion filter is applied before name conversion. Character classes are unlikely to work as expected on unconverted names.
| November 18, 2014 |