Skip to content Skip to sidebar Skip to footer

Safely Extracting Partial Data From Pickled Objects

I've got a pickled instance of an object and have to accept these pickled instances from untrusted sources. There is internal state (just an array of integers) that I can use to re

Solution 1:

An idea might be to read the pickled objects from the files as strings, then use pickletools.dis to see what's in them… only allowing a specific list of commands ('STOP', 'INT', …) to be in the second column. That would negate the pickle having any of the types of objects that you are worried about, and if you are only targeting a very specific list of basic python objects, you might be able to do this safely.

Here's what you get with pickletools.dis:

>>>import pickletools>>>import pickle           >>>>>>p1 = pickle.dumps(1)>>>p2 = pickle.dumps(min)>>>>>>pickletools.dis(p1)
    0: I    INT        1
    3: .    STOP
highest protocol among opcodes = 0
>>>pickletools.dis(p2)
    0: c    GLOBAL     '__builtin__ min'
   17: p    PUT        0
   20: .    STOP
highest protocol among opcodes = 0
>>>

It's better than writing a full pickle parser, and possibly doable if you only want to allow simple objects like INTs.

Solution 2:

You can do this but only if you parse the data yourself, not relying on pickle which could lead to arbitrary code execution. A very simple example of doing could be

import pickle
import re

classTest(object):
    def__init__(self, l):
        self.internal_list = l
        self.foo = 2
        self.bar = 24# Create a pickled version of an object
t = Test([1,2,3,4,5,6,7,8,9,10])
pickle.dump(t, open("test.pickle",'w'))

deffind_last_integer(s):
    """ Parses a string to return the integer that it ends with
        e.g. find_last_integer("foobar312") == 312
    """returnint(re.search(r"\d+$", s).group())

# Load the pickled data
data = open("test.pickle").read()
listdata = data[data.find("(lp"):].split('\n') # Assumes that the class will only contain one list# if you need more then look for all lines starting "(lp"

nelements = find_last_integer(listdata[0])

# Each element of the list should be of the form "In" or "aIn"
reconstructed = [find_last_integer(elem) for elem in listdata[1:nelements+1]]
print reconstructed

Note that I've only tested the above code in python 2.7.8 YMMV if you use it with other versions.

Post a Comment for "Safely Extracting Partial Data From Pickled Objects"