Bystroushaak's blog / English section / Explorations / List of notion.so python API libraries 2022-04

List of notion.so python API libraries 2022-04

My use case is simple; I have a table page (or in notion terminology, a database) defined in my notion.so namespace. I have set up an integration token, which can access this page. And I need to iterate over all the items in the table.

👉
Tldr; It's not good. There is no library, which would do a good job.

My definition of a good job is:

Provide object-oriented API for the calls. So that I can for example reference the table-page by the ID and then iterate over it like this:

from imaginary_notion_api import Client

notion = Client("secret_token")

changelog_db = notion.Database("DATABASE_ID")
for row in changelog_db:
    print(row.updated, row.title, row.text)

Maybe also add new values by calling .append() and other native python operations, like square bracket access and so on. I want to avoid being bothered with lowlevel operations, the fact that for listing the database, you have to do an empty query, or that the API returns some kind of JSON.

Basically, what I want is an API which has a high-enough level of abstraction which corresponds with the things you want to do, on the object level, and with python’s syntax constructs (context managers, iterator API, and so on). I am not really interested in thin wrapper over the API which returns JSONs that you need to parse separately.

👉
Once upon a time, I’ve created something like this on top of lowlevel library for accessing sharepoint in office365; https://github.com/Bystroushaak/o365_sharepoint_connector

notion-sdk

I’ve tried to use it because at first glance, it seems like it works. Except it doesn’t. It does a good job at defining all the types returned from the Notion API, but that’s pretty much it.

from notion import NotionClient

notion = NotionClient(auth="secret_token")

changelog_db = notion.databases.retrieve("DATABASE_ID")
print(changelog_db.title)

for item in changelog_db:
   print(item)

Great. I get back the Database object, which has no methods, and I can’t figure how to iterate over values. Looking into the code itself doesn’t help. There seem to be no methods.

Documentation is missing:

And the examples/ directory is similarly useless:

Edit: Okay, so I figured it out by reading notion API documentation. You have to query the database with the empty query, to really get the content. Sadly, when I try it, I get the very, very long traceback full of:

Traceback (most recent call last):
  File "/home/bystrousak/Plocha/xlit/notion_blog_generator/lib/preprocessors/apitest.py", line 14, in <module>
    changelog_db = notion.databases.query("DATABASE_ID")
  File "/home/bystrousak/Plocha/xlit/notion_blog_generator/venv/lib/python3.8/site-packages/notion/endpoints/sync.py", line 94, in query
    return PaginatedList[Page].parse_obj(
  File "pydantic/main.py", line 511, in pydantic.main.BaseModel.parse_obj
  File "pydantic/main.py", line 331, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1326 validation errors for PaginatedList[Page]
results -> 0 -> properties -> Updated -> type
  unexpected value; permitted: <PropertyValueType.TITLE: 'title'> (type=value_error.const; given=date; permitted=[<PropertyValueType.TITLE: 'title'>])
results -> 0 -> properties -> Updated -> title
  field required (type=value_error.missing)
results -> 0 -> properties -> Updated -> type
  unexpected value; permitted: <PropertyValueType.RICH_TEXT: 'rich_text'> (type=value_error.const; given=date; permitted=[<PropertyValueType.RICH_TEXT: 'rich_text'>])
results -> 0 -> properties -> Updated -> rich_text

Which continues like this for several hundred of lines. Typing, huh?

.next()

notion-py

This looks great, it seems to support all the features I need. The only problem is, that it uses unofficial API. This basically means, that they reverse-engineered the API calls notion itself does, and implemented it by writing a custom wrapper.

Before notion published API, this was the only way how to do it, and I’ve used it in the past myself. But this has two serious problems:

Especially the second point is annoying. Every time notion logs you out, you have to go to developer console in your browser and copy the token from the Cookie. Which is the extra manual step you don’t want to do in your automation.

.next()

notion-database

From the README and the name, you could expect that this will be precisely what I am looking for, right?

The code looks straightforward:

from notion_database.database import Database

database = Database(
    integrations_token="secret_token"
)
database.retrieve_database(
    database_id="DATABASE_ID", get_properties=True
)

print(database.properties_list)

Except it prints only the names of the column and internal ID’s:

[{'id': 'LE~%7D', 'name': 'Updated', 'type': 'date', 'date': {}}, {'id': 'YtoI', 'name': 'Title', 'type': 'rich_text', 'rich_text': {}}]

Well. Okay, so how do I iterate over the actual values? Documentation doesn’t share this secret knowledge. And I can’t see it from the debugger:

dir(database)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'create_database', 'find_all_page', 'list_databases', 'properties_list', 'query_database', 'request', 'result', 'retrieve_database', 'run_query_database', 'update_database', 'url']

Should I query the database with the ID’s from the properties? I am looking into the code itself, and it doesn’t look like a promising approach:

    def query_database(self):
        # Not Implemented
        pass

Am I missing something? I’ve tried different methods, and they return basically nothing of any value.

I look into the documentation, and then I see it:

from notion_database.database import Database

database = Database(
    integrations_token="secret_token"
)
database.find_all_page(database_id="DATABASE_ID")
print(database.result)

This seems to return some kind of json / dict with all the rows. Finally!

But at this point, I could construct the query myself in like ten lines of code. It appears that the library is mostly useless because all I get is JSON and not python objects. I mean, I could have done the same thing with the code straight from the documentation, and it would be only slightly longer and functionally the same:

import requests

url = "https://api.notion.com/v1/databases/DATABASE_ID/query"
payload = {"page_size": 100}
headers = {
    "Accept": "application/json",
    "Notion-Version": "2022-02-22",
    "Content-Type": "application/json",
    "Authorization": "Bearer secret_token"
}
response = requests.request("POST", url, json=payload, headers=headers)
print(response.text)

.next()

notion-sdk-py

This seems like an actively developed project, but it has the same problem as the previous one: it provides only a thin wrapper over the abstractions. You can query the values using python API, but you’ll get JSON / dict back.

.next()

Other libraries

Uh oh. It seems like we’ve got to the end. When I search the GitHub, I can see that there are several other projects, but mostly in worse shape than the ones already discussed:

Conclusion

It seems to me, that the situation is not improving. I've done the same “research” approximately a year ago, concluded that there isn’t any good library, and hoped, that something would appear since then. It didn’t.

Now I have to think about what to do:

The problem is, that I don’t really want to do any of this. *Sigh*.

I mean, I can probably create something, but I don’t actually want to maintain it in the long term. It could be partially fixed by using some kind of code generators, but that seems fragile, and it will break all the time. I’ve look and there doesn’t seem to be any kind of formal specification, like swagger definition, or something like that.

Sure, I can use lowlevel API, but this has its own issues. For this use case, it is fine, but when I'll make something more complicated, the result will be messy code, which is pretty badly maintainable.

Edit

Eventually, I’ve used raw requests calls to get the job done, but it wasn’t a pleasant experience. Especially to have to dig the data from the four or five level deep nested JSON arrays and objects is really annoying and unreadable.

At the moment, I am trying to create proof-of-concept experimental library for notion API, but I am not sure if I’ll continue to develop it. Mostly I wanted to see how hard it would be to create it with some auto-generation of types and so on.

Become a Patron