There is a python implement called chardet [L1 ].
It works well.
gobvip 2010/07/21
The file command can do it too
bash-5.1$ file --mime-encoding foobar.txt foobar.txt: iso-8859-1
The encguess command from Perl can also do this. There are pathological cases that encguess handles better than the file command, and vice versa.
pi31415 2024/01/26
MiR 2024/01/30
If its only to decide if a file is a test or binary file, one can use code from file's encoding.c like this: (proof of concept, extend to your needs if necessary)
proc checkFileText {fname} { # check if this is a text or a binary file # https://github.com/file/file/blob/f2a6e7cb7db9b5fd86100403df6b2f830c7f22ba/src/encoding.c#L151-L228 #define F 0 /* character never appears in text */ #define T 1 /* character appears in plain ASCII text */ #define I 2 /* character appears in ISO-8859 text */ #define X 3 /* character appears in non-ISO extended ASCII (Mac, IBM PC) */ set text_chars { F F F F F F F T T T T T T T F F F F F F F F F F F F F T F F F F T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T F X X X X X T X X X X X X X X X X X X X X X X X X X X X X X X X X I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I }; set fn [open $fname rb] fconfigure $fn -translation binary set teststring [read $fn 1024] close $fn set n [string length $teststring] binary scan $teststring "c$n" binstr array get $binstr for {set i 0} {$i<$n} {incr i} { set bstr [lindex $binstr $i] set bstr [expr {$bstr & 0xFF}]; # val == 0x8000 set cstr [lindex $text_chars $bstr] if {$cstr=="F"} { return -1 } } return 1 }