読者です 読者をやめる 読者になる 読者になる

Kernel#test が便利

Array#sort_byのマニュアルを見ていたら便利そうなKernel.testの記述がありました。

Array#sortのマニュアルより抜粋

まずはsortとsort_byのbenchmark

   require 'benchmark'

   a = (1..100000).map {rand(100000)}

   Benchmark.bm(10) do |b|
     b.report("Sort")    { a.sort }
     b.report("Sort by") { a.sort_by {|a| a} }
   end

produces:

   user     system      total        real
   Sort        0.180000   0.000000   0.180000 (  0.175469)
   Sort by     1.980000   0.040000   2.020000 (  2.013586)

sort_by遅いね。
しかし、、、

ファイルのmtimeでのsort

However, consider the case where comparing the keys is a non-trivial
operation. The following code sorts some files on modification time
using the basic sort method.

   files = Dir["*"]
   sorted = files.sort {|a,b| File.new(a).mtime <=> File.new(b).mtime}
   sorted   #=> ["mon", "tues", "wed", "thurs"]

mtimeで比較となると、まぁ私もFileオブジェクト使っちゃいますね。

Kernel#test の登場

しかし、これはFileオブジェクトを毎回2つ作っちゃうので効率的でない。
そこで登場するのがKernel#test。
へー知らんかった。めっちゃ便利。

This sort is inefficient: it generates two new File
objects during every comparison. A slightly better technique is to
use the Kernel#test method to generate the modification
times directly.

   files = Dir["*"]
   sorted = files.sort { |a,b|
     test(?M, a) <=> test(?M, b)
   }
   sorted   #=> ["mon", "tues", "wed", "thurs"]

まだ無駄がある

そらまぁ、Fileオブジェクトは毎回作らなくてもTimeオブジェクトは作成されます。
そこで登場するのが昨日のSchwartzian Transform(シュワルツ変換)ですね。
参考:rubyのsort_by / shuffle から学ぶシュワルツ変換とフィッシャー - イェーツのシャッフル - rochefort's blog

This still generates many unnecessary Time objects. A
more efficient technique is to cache the sort keys (modification
times in this case) before the sort. Perl users often call this
approach a Schwartzian Transform, after Randal Schwartz. We
construct a temporary array, where each element is an array
containing our sort key along with the filename. We sort this array,
and then extract the filename from the result.

   sorted = Dir["*"].collect { |f|
      [test(?M, f), f]
   }.sort.collect { |f| f[1] }
   sorted   #=> ["mon", "tues", "wed", "thurs"]

そしてこれを内部的にやっているのがsort_by

This is exactly what sort_by does internally.

   sorted = Dir["*"].sort_by {|f| test(?M, f)}
   sorted   #=> ["mon", "tues", "wed", "thurs"]

sort_byはシュワルツ変換なんです。sort_by が良くなる局面も多々有ります。

Kernel#testのオプション

たくさん便利なのがありました。
module function Kernel.#test (Ruby 2.3.0)

  Test   Returns   Meaning
  "A"  | Time    | Last access time for file1
  "b"  | boolean | True if file1 is a block device
  "c"  | boolean | True if file1 is a character device
  "C"  | Time    | Last change time for file1
  "d"  | boolean | True if file1 exists and is a directory
  "e"  | boolean | True if file1 exists
  "f"  | boolean | True if file1 exists and is a regular file
  "g"  | boolean | True if file1 has the \CF{setgid} bit
       |         | set (false under NT)
  "G"  | boolean | True if file1 exists and has a group
       |         | ownership equal to the caller's group
  "k"  | boolean | True if file1 exists and has the sticky bit set
  "l"  | boolean | True if file1 exists and is a symbolic link
  "M"  | Time    | Last modification time for file1
  "o"  | boolean | True if file1 exists and is owned by
       |         | the caller's effective uid
  "O"  | boolean | True if file1 exists and is owned by
       |         | the caller's real uid
  "p"  | boolean | True if file1 exists and is a fifo
  "r"  | boolean | True if file1 is readable by the effective
       |         | uid/gid of the caller
  "R"  | boolean | True if file is readable by the real
       |         | uid/gid of the caller
  "s"  | int/nil | If file1 has nonzero size, return the size,
       |         | otherwise return nil
  "S"  | boolean | True if file1 exists and is a socket
  "u"  | boolean | True if file1 has the setuid bit set
  "w"  | boolean | True if file1 exists and is writable by
       |         | the effective uid/gid
  "W"  | boolean | True if file1 exists and is writable by
       |         | the real uid/gid
  "x"  | boolean | True if file1 exists and is executable by
       |         | the effective uid/gid
  "X"  | boolean | True if file1 exists and is executable by
       |         | the real uid/gid
  "z"  | boolean | True if file1 exists and has a zero length

Tests that take two files:

  "-"  | boolean | True if file1 and file2 are identical
  "="  | boolean | True if the modification times of file1
       |         | and file2 are equal
  "<"  | boolean | True if the modification time of file1
       |         | is prior to that of file2
  ">"  | boolean | True if the modification time of file1
       |         | is after that of file2