Bystroushaak's blog / English section / Programming / Python / Don't use Shelve, use sqlitedict

Don't use Shelve, use sqlitedict

Each and every time I try to use shelve module to actually store anything for some period of time, I end up with following error:

Traceback (most recent call last):
  File "/home/bystrousak/scripty/data_loger/configuration.py", line 44, in load
    self.from_dict(db.get("data", {}))
  File "/usr/lib/python2.7/shelve.py", line 113, in get
    if key in self.dict:
  File "/usr/lib/python2.7/_abcoll.py", line 388, in __contains__
    self[key]
  File "/usr/lib/python2.7/bsddb/__init__.py", line 270, in __getitem__
    return _DeadlockWrap(lambda: self.db[key])  # self.db[key]
  File "/usr/lib/python2.7/bsddb/dbutils.py", line 68, in DeadlockWrap
    return function(*_args, **_kwargs)
  File "/usr/lib/python2.7/bsddb/__init__.py", line 270, in <lambda>
    return _DeadlockWrap(lambda: self.db[key])  # self.db[key]
bsddb.db.DBPageNotFoundError: (-30986, 'BDB0075 DB_PAGE_NOTFOUND: Requested page not found')

It has really wonderful ability to fuck itself up, randomly. And there is no easy way how to recover it from the failure.

So, my advice is to use sqlitedict instead. It has a same API and generally works in similar way, but I never encountered an error while using it.

Simple object storage

For those of you who don't know what shelve does; it provides you with simple object persistence. It works as a dictionary, where you can put any object, and it is automatically serialized and stored on the disc. Next time, you can load it with just a few lines.

This kind of storage is excellent when you just need to store something in some script, but you don't want to bother yourself with object serialization and configuration files. Just put the stuff you want to save (any object, remember) into the storage and then load it back next time.

I use it for simple storage all the time:

from sqlitedict import SqliteDict

class SomeData:
  def __init__(self, text):
    self.text = text

with SqliteDict('./my_db.sqlite') as mydict:
  if "anything" not in mydict:
    mydict["anything"] = SomeData("some data")

  my_data = mydict["anything"]

.. and next time you run the script, value is stored.

Usage examples

This kind of storage is excellent as simple cache. You process / download some heavy data, store them in the timestamp key in the sqlitedict. Then on the next run of the script, if the timestamp delta is lower than some timeout value, just pick them and use them again. In like three lines of code.

I used sqlitedict in my analysis of abclinuxu.cz, where I periodically download all blogposts published on the page and run analysis on them. Blogpost objects with parsed trees in html parser and full state are stored in the sqlitedict and simply loaded anytime I need them again.

I also used sqlitedict in the project of my tag manager, where some troll vandalized tags under one of my blogpost on abclinuxu. Any anonymous user could come and vandalize list of tags. So I created simple script, that stores the currently used tags in the sqlitedict, and second script, which is periodically run, that restores the tags. Simple and effective.

Become a Patron