Type Profiler: An Analysis to guess type signatures
- 1. Type Profiler: An analysis to
guess type signatures
Yusuke Endoh (@mametter)
Cookpad Inc.
RubyKaigi 2018 (2018/06/01)
- 4. Endless range
• Take an array without the first element
ary=["a","b","c"]
ary[1..-1] #=> ["b","c"]
ary.drop(1) #=> ["b","c"]
ary[1..] #=> ["b","c"]
- 5. Endless range
• Loop from 1 to infinity
i=1; loop { ……; i+=1 }
(1..Float::INFINITY).each {……}
1.step {|i|……}
(1..).each {|i|……}
- 7. Endless range
✓Has been already committed in trunk
✓Will be included in Ruby 2.6
• Stay tuned!
ary[1..]
(1..).each {……}
ary.zip(1..) {|x,i|……}
- 9. Today’s theme
• Ruby3's type.
• Some people held some meetings
to discuss Ruby3's type
– Matz, soutaro, akr, ko1, mame
– Main objective: clarify matz's hidden
requirements (and compromises) for
Ruby3's type
• (Not to decide everything behind closed door)
• We'll explain the (current) requirements
- 10. Agenda
• A whirlwind tour of already-proposed
"type systems" for Ruby
• Type DB: A key concept of Ruby3's
type system
• A missing part: Type profiler
- 12. Type-related systems for Ruby
• Steep
– Static type check
• RDL
– (Semi) static type check
• contracts.ruby
– Only dynamic check of arguments/return values
• dry-types
– Only dynamic checks of typed structs
• RubyTypeInference (by JetBrains)
– Type information extractor by dynamic analysis
• Sorbet (by Stripe)
- 13. RDL: Types for Ruby
• Most famous in academic world
– Jeff Foster at Univ. of Maryland
– Accepted in OOPSLA, PLDI, and POPL!
• The gem is available
– https://github.com/plum-umd/rdl
• We evaluated RDL
– thought writing type annotations for
OptCarrot
- 14. Basis for RDL
# load RDL library
require "rdl"
class NES
# activate type annotations for RDL
extend RDL::Annotate
# type annotation before method definition
type "(?Array<String>) -> self", typecheck: :call
def initialize(conf = ARGV)
...
- 15. RDL type annotation
• Accepts one optional parameter typed
Array of String
• Returns self
– Always "self" for initialize method
type "(?Array<String>) -> self", typecheck: :call
def initialize(conf = ARGV)
...
- 16. RDL type annotation
• "typecheck" controls type check timing
– :call: when this method is called
– :now: when this method is defined
– :XXX: when "RDL.do_typecheck :XXX" is
done
– nil: no "static check" is done
• Used to type-check code that uses the method
• Still "run-time check" is done
type "(?Array<String>) -> self", typecheck: :call
def initialize(conf = ARGV)
...
- 17. Annotation for instance variables
• Needs type annotations for all
instance variables
class NES
# activate type annotations for RDL
extend RDL::Annotate
var_type :@cpu, "%any"
type "() -> %any", typecheck: :call
def reset
@cpu.reset
#=> receiver type %any not supported yet
...
- 18. Annotation for instance variables
• Needs type annotations for all
instance variables
class NES
# activate type annotations for RDL
extend RDL::Annotate
var_type :@cpu, "[reset: () -> %any]"
type "() -> %any", typecheck: :call
def reset
@cpu.reset
#=> receiver type [reset: () -> %any] not sup
...
- 19. Annotation for instance variables
• Needs type annotations for all
instance variables
class NES
# activate type annotations for RDL
extend RDL::Annotate
var_type :@cpu, "Optcarrot::CPU"
type "() -> %any", typecheck: :call
def reset
@cpu.reset
# error: no type information for
# instance method `Optcarrot::CPU#reset'
- 20. Annotation for instance variables
• Succeeded to type check
class NES
# activate type annotations for RDL
extend RDL::Annotate
type "Optcarrot::CPU","reset","()->%any"
var_type :@cpu, "Optcarrot::CPU"
type "() -> %any", typecheck: :call
def reset
@cpu.reset
...
- 21. Requires many annotations...
type "() -> %bot", typecheck: :call
def reset
@cpu.reset
@apu.reset
@ppu.reset
@rom.reset
@pads.reset
@cpu.boot
@rom.load_battery
end
- 22. Requires many annotations...
type "() -> %bot", typecheck: nil
def reset
@cpu.reset
@apu.reset
@ppu.reset
@rom.reset
@pads.reset
@cpu.boot
@rom.load_battery
end
No static
check
- 23. … still does not work
type "() -> %bot", typecheck: nil
def reset
...
@rom.load_battery #=> [65533]
end
# Optcarrot::CPU#reset: Return type error.…
# Method type:
# *() -> %bot
# Actual return type:
# Array
# Actual return value:
# [65533]
- 24. Why?
• typecheck:nil doesn't mean no check
– Still dynamic check is done
• %bot means "no-return"
– Always raises exception, process exit, etc.
– But this method returns [65533]
– In short, this is my bug in the annotation
type "() -> %bot", typecheck: nil
def reset
...
@rom.load_battery #=> [65533]
end
- 25. Lessons: void type
• In Ruby, a lot of methods return
meaningless value
– No intention to
allow users
to use the value
• What type should we use in this case?
– %any, or return nil explicitly?
• We need a "void" type
– %any for the method; it can return anything
– "don't use" for users of the method
def reset
LIBRARY_INTERNAL_ARRAY.
each { … }
end
- 27. RDL's programmable annotation
• RDL supports pre-condition check
– This can be also used to make type
annotation automatically
• I like this feature, but matz doesn't
– He wants to avoid type annotations
embedded in the code
– He likes separated, non-Ruby type definition
language (as Steep)
pre(:belongs_to) do |name|
……
type name, "() -> #{klass}"
end
- 28. Summary: RDL
• Semi-static type check
– The timing is configurable
• It checks the method body
– Not only dynamic check of
arguments/return values
• The implementation is mature
– Many features actually works, great!
• Need type annotations
• Supports meta-programming
- 29. Steep
• Snip: You did listen to soutaro's talk
• Completely static type check
• Separated type definition language
– .rbi
– But also requires (minimal?) type
annotation embedded in .rb files
- 31. Digest: dry-types
require 'dry-types'
require 'dry-struct'
module Types
include Dry::Types.module
end
class User < Dry::Struct
attribute :name, Types::String
attribute :age, Types::Integer
end
• Can define structs with typed fields
– Run-time type check
– "type_struct" gem is similar
- 32. Digest: RubyTypeInference
• Type information extractor by dynamic
analysis
– Run test suites under monitoring of
TracePoint API
– Hooks method call/return events, logs
the passed values, and aggregate them
to type information
– Used by RubyMine IDE
- 34. Summary of Type Systems
Objective Targets Annotations
Steep Static type
check
Method body Separated
(mainly)
RDL Semi-static
type check
Method body Embedded in
code
contracts.
ruby
Dynamic
type check
Arguments and
return values
Embedded in
code
dry-types Typed
structs
Only Dry::Struct
classes
Embedded in
code
RubyType
Inference
Extract type
information
Arguments and
return values
N/A
- 36. Idea
• Separated type definition file is good
• But meta-programming like attr_* is
difficult to support
– Users will try to generate it programmatically
• We may want to keep code position
– To show lineno of code in type error report
– Hard to manually keep the correspondence
between type definition and code position
in .rbi file
– We may also want to keep other information
- 38. How to create Type DB
Type
DB
Steep type
definition
Ruby
code
write
manually compile
stdlib
Already included
RubyTypeInference
automatically extract by dynamic analysis
Type Profiler
- 40. Type Profiler
• Another way to extract type information
from Ruby code
– Alternative "RubyTypeInference"
• Is not a type inference
– Type inference of Ruby is hopeless
– Conservative static type inference can
extracts little information
• Type profiler "guesses" type information
– It may extract wrong type information
– Assumes that user checks the result
- 41. Type Profilers
• There is no "one-for-all" type profiler
– Static type profiling cannot handle
ActiveRecord
– Dynamic type profiling cannot extract
syntactic features (like void type)
• We need a variety of type profilers
– For ActiveRecord by reading DB schema
– Extracting from RDoc/YARD
- 42. In this talk
• We prototyped three more generic
type profilers
– Static analysis 1 (SA1)
• Mainly for used-defined classes
– Static analysis 2 (SA2)
• Mainly for builtin classes
– Dynamic analysis (DA)
• Enhancement of "RubyTypeInference"
- 43. SA1: Idea
• Guess a type of formal parameters
based on called method names
class FooBar
def foo(...); ...; end
def bar(...); ...; end
end
def func(x) #=> x:FooBar
x.foo(1)
x.bar(2)
end
- 44. SA1: Prototyped algorithm
• Gather method
definitions in each
class/modules
– FooBar={foo,bar}
• Gather method calls
for each parameters
– x={foo,bar}
– Remove general methods (like #[] and #+)
to reduce false positive
– Arity, parameter and return types aren't used
• Assign a class that all methods match
class FooBar
def foo(...);...;end
def bar(...);...;end
end
def func(x)
x.foo(1)
x.bar(2)
end
- 45. SA1: Evaluation
• Experimented SA1 with WEBrick
– As a sample code that has many user-
defined classes
• Manually checked the guessed result
– Found some common guessing failures
• Wrong result / no-match result
– No quantitative evaluation yet
- 46. SA1: Problem 1
• A parameter is not used
• Many methods are affected
def do_GET(req, res)
raise HTTPStatus::NotFound, "not found."
end
DefaultFileHandler#do_GET(req:#{}, res:HTTPResponse)
FileHandler#do_GET(req:#{}, res:#{})
AbstractServlet#do_GET(req:#{}, res:#{})
ProcHandler#do_GET(request:#{}, response:#{})
ERBHandler#do_GET(req:#{}, res:HTTPResponse)
- 47. SA1: Problem 2
• Incomplete guessing
• Cause
– the method calls req.request_uri
– Both HTTPResponse and HTTPRequest
provides request_uri
HTTPProxyServer#perform_proxy_request(
req: HTTPResponse | HTTPRequest,
res: WEBrick::HTTPResponse,
req_class:#{new}, :nil)
- 48. (Argurable) solution?
• Exploit the name of parameter
– Create a mapping from parameter name
to type after profiling
• "req" HTTPRequest
– Revise guessed types using the mapping
• Fixed!
DefaultFileHandler#do_GET(req:HTTPRequest, res:HTTPResponse)
FileHandler#do_GET(req:HTTPRequest, res:HTTPResponse)
AbstractServlet#do_GET(req:HTTPRequest, res:HTTPResponse)
ProcHandler#do_GET(request:#{}, response:#{})
ERBHandler#do_GET(req:HTTPRequest, res:HTTPResponse)
CGIHandler#do_GET(req:HTTPRequest, res:HTTPResponse)
- 49. SA1: Problem 3
• Cannot guess return type
• Can guess in only limited cases
– Returns formal parameter
– Returns a literal or "Foo.new"
– Returns an expression which is already
included Type DB
• See actual usage of the method?
– Requires inter-procedural or
whole-program analysis!
- 50. SA1: Pros/Cons
• Pros
– No need to run tests
– Can guess void type
• Cons
– Hard when parameters are not used
• This is not a rare case
– Heuristic may work, but cause wrong
guessing
- 51. SA2: Idea
• I believe this method expects Numeric!
def add_42(x) #=> (x:Num)=>Num
x + 42
end
- 52. SA2: Prototyped algorithm
• Limited type DB of stdlib
– Num#+(Num) Num
– Str#+(Str) Str, etc.
• "Unification-based type-inference"
inspired algorithm
– searches "α#+(Num) β"
– Matches "Num#+(Num) Num"
• Type substitution: α=Num, β=Num
x + 42
- 53. SA2: Prototyped algorithm (2)
• When multiple candidates found
– matches:
• Num#<<(Num) Num
• Str#<<(Num) Str
• Array[α]#<<(α) Array[α]
– Just take union types of them
• (Overloaded types might be better)
def push_42(x)
x << 42
end
#=> (x:(Num|Str|Array))=>(Num|Str|Array)
x << 42
- 54. SA2: Evaluation
• Experimented SA1 with OptCarrot
– As a sample code that uses many builtin
types
• Manually checked the guessed result
– Found some common guessing failures
• Wrong result / no-match result
– No quantitative evaluation yet
- 55. SA2: Problem 1
• Surprising result
– Counterintuitive, but actually it works
with @fetch:Array[Num|Str]
def peek16(addr)
@fetch[addr] + (@fetch[addr + 1] << 8)
end
# Optcarrot::CPU#peek16(Num) => (Num|Str)
- 56. SA2: Problem 2
• Difficult to handle type parameters
– Requires constraint-based type-inference
@ary = [] # Array[α]
@ary[0] = 1 # unified to Array[Num]
@ary[1] = "str" # cannot unify Num and Str
- 57. SA2: Pros/Cons
• Pros
– No need to run tests
– Can guess void type
– Can guess parameters that is not used as a
receiver
• Cons
– Cause wrong guessing
– Hard to handle type parameters (Array[α])
– Hard to scale
• The bigger type DB is, more wrong results will
happen
- 58. DA: Idea
• Recording actual inputs/output of
methods by using TracePoint API
– The same as RubyTypeInference
• Additional features
– Support block types
• Required enhancement of TracePoint API
– Support container types: Array[Int]
• By sampling elements
- 60. DA: Problem 1
• Very slow (in some cases)
– Recording OptCarrot may take hours
– Element-sampling for Array made it faster,
but still take a few minutes
• Without tracing, it runs in a few seconds
– It may depend on application
• Profiling WEBrick is not so slow
- 61. DA: Problem 2
• Cannot guess void type
– Many methods returns garbage
– DA cannot distinguish garbage and
intended return value
• SA can guess void type by heuristic
– Integer#times, Array#each, etc.
– if statement that has no "else"
– while and until statements
– Multiple assignment
• (Steep scaffold now supports some of them)
- 62. DA: Problem 3
• Some tests confuse the result
– Need to ignore error-handling tests by
cooperating test framework
assert_raise(TypeError) { … }
- 63. DA: Pros/Cons
• Pros
– Easy to implement, and robust
– It can profile any programs
• Including meta-programming like
ActiveRecord
• Cons
– Need to run tests; it might be very slow
– Hard to handle void type
– TracePoint API is not enough yet
– Need to cooperate with test frameworks
- 64. Conclusion
• Reviewed already-proposed type
systems for Ruby
– Whose implementations are available
• Type DB: Ruby3's key concept
• Some prototypes and experiments of
type profilers
– Need more improvements / experiments!