Programming Evolved

Data Serialization Is Different To Data Transfer!

| Comments

Nearly everyone at this stage has heard about the huge rails vulnerabilities that were discovered this month. However, there is a lot of confusion and misinformation on the net about it, and what the primary causes of them were. So here is a complete guide to what caused the bugs, and why I think not understanding the difference between data serialization and data transfer (or more precisely, data interchange) was the root cause of the security issues.

Basic setup

For all the examples in this post, I ran them with ruby 1.9.3 with the Gemfile:

Copy to Clipboard
Gemfile
source "http://rubygems.org"
gem 'activesupport', '3.2.9'
gem 'builder', '3.1.4' # needed for to_xml
gem 'json', '1.7.6'

This basically defines what versions of libraries will be running when I run the examples. I chose 3.2.9 for the activesupport library as that is the newest version containing the bug, and 1.7.6 of json for the same reason.

There are two key factors in yaml bug: the from_xml method in the activesupport library, and what YAML is and should do.

Active Support

Rails has a sub library known as activesupport. This is a library that isn’t strictly to do with rails, but instead supplies a lot of additional methods for core ruby classes. For example:

Copy to Clipboard
active support methods
1
2
3
4
5
6
7
8
9
10
?> require 'active_support/core_ext'
=> true
>> 16.hours
=> 57600 seconds
>> "".blank?
=> true
>> 3.in? [1,2,3,4]
=> true
>> 'alice in wonderland'.titleize
=> "Alice In Wonderland"

This can be used in any ruby project, and is very useful even in non-rails apps.

Two more useful methods are the from_xml and to_xml methods. They convert an xml string into a Hash object, and vice versa, eg:

Copy to Clipboard
xml methods
1
2
3
4
5
6
7
8
9
Hash.from_xml('<person><name>Fred</name><age>34</age></person>')
=> {"person"=>{"name"=>"Fred", "age"=>"34"}}
>> puts({ 'x' => 'hello', 'y' => 'goodbye' }.to_xml)
<?xml version="1.0" encoding="UTF-8"?>
<hash>
  <x>hello</x>
  <y>goodbye</y>
</hash>
=> nil

Their behaviour is very obvious when working with strings. They can also work with other types as well:

Copy to Clipboard
xml methods
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
hash = { number: 34, array: [1,2,3, "str"], inner_h: { "x" => 3 } }
=> {:number=>34, :array=>[1, 2, 3, "str"], :inner_h=>{"x"=>3}}
>> puts(hash.to_xml)
<?xml version="1.0" encoding="UTF-8"?>
<hash>
  <number type="integer">34</number>
  <array type="array">
    <array type="integer">1</array>
    <array type="integer">2</array>
    <array type="integer">3</array>
    <array>str</array>
  </array>
  <inner-h>
    <x type="integer">3</x>
  </inner-h>
</hash>
=> nil
>> Hash.from_xml(hash.to_xml)
=> {"hash"=>{"number"=>34, "array"=>[1, 2, 3, "str"], "inner_h"=>{"x"=>3}}}
>> Hash.from_xml(hash.to_xml)["hash"] == hash.stringify_keys
=> true

Note how the xml elements are given a type="<type>" attribute. This is used by from_xml to parse the objects back to the original type. The only change is symbol keys became strings. This is fixed with stringify_keys, allowing a round trip conversion using simple types.

YAML

The YAML website, describes YAML as:

… a human friendly data serialization standard for all programming languages.

It is useful to compare that to the definition given for json from its website:

JSON (JavaScript Object Notation) is a lightweight data-interchange format.

While they seem similar, there is a big difference between data interchange and data serialization. Data exchange just means to pass information from one place to another. Data serialization however involves converting data into a format that can be saved or transferred, and then be transferred back into the original data through deserialization. Ruby has both json and YAML libraries, so the differences between between the two can be easily show.

Copy to Clipboard
JSON simple
1
2
3
4
5
6
7
8
9
10
11
?> require 'json'
=> true
>> hash = { x: 34, y: "hello" }
=> {:x=>34, :y=>"hello"}
>> puts hash.to_json
{"x":34,"y":"hello"}
=> nil
>> JSON::parse(hash.to_json)
=> {"x"=>34, "y"=>"hello"}
>> JSON::parse(hash.to_json) == hash
=> false

JSON, being a data transfer protocol, doesn’t concern itself with being able to deserialize the output back to the original object. Even with the simple example of { x: 34, y: "hello"}, it can’t complete a round trip conversion, since the json spec has no reference of symbols, and therefore just converts them to strings.

This isn’t a bad thing for a data transfer format - in fact it makes the spec very simple. One of the reasons json is taking over/has taken over xml as a data transfer format is for this reason. It is easy to create and consume by any language as it only has a small number of primitives.

YAML, on the other hand, does focus on data serialization, and can complete the round trip:

Copy to Clipboard
yaml simple
1
2
3
4
5
6
7
8
9
10
11
require 'yaml'
=> false
>> puts hash.to_yaml
---
:x: 34
:y: hello
=> nil
>> YAML::load(hash.to_yaml)
=> {:x=>34, :y=>"hello"}
>> YAML::load(hash.to_yaml) == hash
=> true

But it is not just hashes that can be serialized by YAML, almost anything can. For example, using the definitions from this file:

Copy to Clipboard
definitions.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class Person
  attr_accessor :name, :age
  
  def ==(other)
    name == other.name && age == other.age
  end
end

class Organization 
  attr_accessor :employees, :boss
  def ==(other)
    employees == other.employees && boss == other.boss
  end
end

class CustomHash < Hash

end

# make some global objects for testing
$person = Person.new
$person.age = 34
$person.name = "Fred"

$person2 = Person.new
$person2.age = 50
$person2.name = 'Sarah'

$boss = Person.new
$boss.age = 60
$boss.name = 'Bill'

$organization = Organization.new
$organization.employees = [$person, $person2]
$organization.boss = $boss

$hash = CustomHash.new
$hash['organization'] = $organization

YAML can do the following:

Copy to Clipboard
YAML complex object serialization
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
load 'definitions.rb'
=> true
>> puts $person.to_yaml
--- !ruby/object:Person
age: 34
name: Fred
=> nil
>> YAML::load($person.to_yaml) == $person
=> true
>> puts $organization.to_yaml
--- !ruby/object:Organization
employees:
- !ruby/object:Person
  age: 34
  name: Fred
- !ruby/object:Person
  age: 50
  name: Sarah
boss: !ruby/object:Person
  age: 60
  name: Bill
=> nil
>> YAML::load($organization.to_yaml) == $organization
=> true
>> puts $hash.to_yaml
--- !ruby/hash:CustomHash
organization: !ruby/object:Organization
  employees:
  - !ruby/object:Person
    age: 34
    name: Fred
  - !ruby/object:Person
    age: 50
    name: Sarah
  boss: !ruby/object:Person
    age: 60
    name: Bill
=> nil
>> YAML::load($hash.to_yaml) == $hash
=> true

Note how:

  • The output for the serialization is very clear
  • YAML successfully recovered the original objects through deserialization (using the YAML::load method)
  • The output is very ruby specific

This makes YAML quite a nice serialization format, due to the first two points, but a poor transfer format, due to the third point. The story about using YAML as a transfer format gets a lot worse though…

How does YAML deserialize objects??

You may be wondering how yaml was successfully able to rebuild the original objects. It is actually very simple - when it encounters a !ruby/object:<class> section, for each attribute in that section it sets an instance variable on that object to the given value. For example, with:

Copy to Clipboard
.
--- !ruby/object:Person
name: Fred
age: 34

The YAML library effectively runs:

Copy to Clipboard
.
result = Person.new
result.instance_variable_set(:@name, 'Fred')
result.instance_variable_set(:@age, 34)
return result

A special case is working with subclasses of Hash, like CustomHash from above. In that case, it calls [] for each attribute. Given:

Copy to Clipboard
.
--- !ruby/Hash:CustomHash
value: 34
x: "hello"

The following will be run when deserializing it.

Copy to Clipboard
.
result = CustomHash.new
result['value'] = 34
result['x'] = "hello"

These two features however open up huge security holes if you parse YAML from an untrusted source - it would allow an attacker to run any []= method on any class in your system, as well as create any object with any instance instance variables!

For example, if you had the following class in your system:

Copy to Clipboard
lazy_object.rb
1
2
3
4
5
6
7
8
9
10
11
12
# stores an action on a target with parameters. When methods are called on this
# object, evaluates the action, and calls the code on that.
class LazyObject
  def initialize(target, action, arguments)
    @target, @action, @arguments = target, action, arguments
  end
  
  def method_missing(method, *args)
    @result = @target.send(@action, *@arguments) if @result.nil?
    @result.send(method, *args)
  end
end

With the given YAML code:

Copy to Clipboard
dangerous.yaml
--- !ruby/object:LazyObject
target: Kernel
action: eval
arguments: ["File.write('hacked.txt', 'you are compromised!!!')"]

You can now execute arbitrary code, if the original code called any method on the result of the parse

Copy to Clipboard
Remote code execution
1
2
3
4
5
6
7
8
9
10
11
load 'lazy_object.rb'
=> true
>> result = YAML::load_file('dangerous.yaml')
=> #<LazyObject:0x007fe7390a5488 @target="Kernel", @action="eval", @arguments=["File.write('hacked.txt', 'you are compromised!!!')"]>
>> puts result.length # assuming we were expecting an array, we may call length on it. 
NoMethodError: undefined method `length' for 22:Fixnum
  from lazy_object.rb:11:in `method_missing'
  from (irb):20
  from /Users/david/.rvm/rubies/ruby-1.9.3-p125/bin/irb:16:in `<main>'
>> File.read('hacked.txt')
=> "you are compromised!!!"

The scary part of this is there wasn’t any call to eval in the original system - it is just that the eval call could be triggered by initializing LazyObject with the right instance variables. LazyObject isn’t inherently insecure, it is just that if an attacker can create objects at will, many original safe constructs suddenly become unsafe.

This isn’t the only issue with using YAML with untrusted input. See the post by Aaron Patterson for a lot more detail. In short, unless you prevent anything other than a few core types (eg ints, strings, bools, arrays and hashes) to be serialized, YAML isn’t safe with untrusted input.

Now this isn’t very surprising when you consider YAML as a data serialization format - it is no different pickling in python or marshaling in ruby - neither should be used with unsafe input. Serialization security flaws have even been found in java. The only serialization library I would trust with untrusted input would be haskell’s Data.Binary library - by definition of its types, and the fact that there is no reflection, the only thing you can get back from deserialization with Data.Binary is what you expect, or an error.

Unfortunately, this property of YAML isn’t advertised very well, so many people are unaware of this. Including one rails contributer 6 years ago…

YAML and from_xml

The commit in question added a new ‘feature’ to from_xml - it allowed the type of a xml field to also be yaml. For example:

Copy to Clipboard
yaml with xml
1
2
3
4
5
6
7
8
9
10
require 'active_support/core_ext'
=> true
>> xml = <<eof
<field type="yaml">
#{$boss.to_yaml}
</field>
eof
=> "<field type=\"yaml\">\n--- !ruby/object:Person\nage: 60\nname: Bill\n\n</field>\n"
>> data = Hash.from_xml(xml)
=> {"field"=>#<Person:0x007fe739043468 @age=60, @name="Bill">}

And now the from_xml method is completely insecure, for all the reasons described above. The from_xml method is used by rails in parameter parsing when parameters are passed via xml. This vulnerability allowed attackers to remotely execute code by passing an appropriate xml value containing yaml to a rails application.

JSON and deserialization

There was another serious security flaw that was also caused by not understanding the security problems unrestricted deserialization can cause.

The json gem has a “feature” whereby with certain inputs, it would deserialize to an object. For example if the following class existed:

Copy to Clipboard
Hero.rb
1
2
3
4
5
6
7
8
9
class Hero
  attr_accessor :name, :level
  def self.json_create(attributes) 
    result = self.new
    result.name = attributes["name"]
    result.level = attributes["level"]
    result
  end
end

We can now create the class by using the special key "json_class":

Copy to Clipboard
json object creation
1
2
3
4
5
6
?> load 'Hero.rb'
=> true
>> require 'json'
=> true
>> JSON.parse '{ "json_class":"Hero", "name":"Superman", "level":10 }'
=> #<Hero:0x007fd97515c2e0 @name="Superman", @level=10>

We once again have object creation from a data transfer format, and can create any object that implements the json_create class method. There are no classes in rails that use this feature. However, the json gem comes bundled with the JSON::GenericObject class. This object works like javascript objects do: you can set attributes with obj.val = "something", and retrieve them with obj.val:

Copy to Clipboard
JSON::GenericObject
1
2
3
4
5
6
7
8
9
10
11
12
obj = JSON::GenericObject.new
=> #<JSON::GenericObject>
>> obj.val = 34
=> 34
>> puts obj.val
34
=> nil
>> obj.trusted_input = true
=> true
>> puts obj.trusted_input
true
=> nil

Unfortunately, this allows us, with the powers of duck typing, to pretend to be another object, and return what we want from methods call by the code that is using the parsed json. This allowed sql injection to occur with rails. The discoverer of this bug made an excellent blog post about it.

Conclusion

Many things have been said about the wave of security flaws that have hit rails over the last month. I believe the key issue is not understanding the difference, and security trade-offs between data transfer and data serialization. The bug with the json library came from it pretending to be a serialization library. The bug in activesupport came from using yaml in a place where untrusted data was not only possible, but the norm. Unless a data serialization library has been made explicitly to also be used with untrusted data, it shouldn’t be used with it.

Comments