22 July, 2021

threads and libxcb: problems now we have two

If you want to write an X application, you need to use some library that speaks the X11 protocol. For a long time this meant libX11, often called xlib, which - like most things about X - is a fantastic bit of engineering that is very much a product of its time with some confusing baroque bits. Overall it does a very nice job of hiding the icky details of the protocol from the application developer.

One of the details it hides has to do with how resource IDs are allocated in X. A resource ID (an XID, in the jargon) is a 32 29-bit integer that names a resource - window, colormap, what have you. Those 29 bits are split up netmask/hostmask style, where the top 8 or so uniquely identify the client, and the rest identify the resource belonging to that client. When you create a window in X, what you really tell the server is "I want a window that's initially this size, this background color (etc.) and from now on when I say (my client id + 17) I mean that window." This is great for performance because it means resource allocation is assumed to succeed and you don't have to wait for a reply from the server.

Key to all this is that in xlib the XID is the return value from the call that issues the resource creation request. Internally the request gets queued into the protocol's write buffer, but the client can march ahead and issue the next few commands as if creation had succeeded - because it probably did, and if it didn't you're probably going to crash anyway.

So to allocate XIDs the client just marches forward through its XID range. What happens when you hit the end of the range? Before X11R4, you'd crash, because xlib doesn't keep track of which XIDs it's allocated, just the lowest one it hasn't allocated yet. Starting in R4 the server added an extension called XC-MISC that lets the client ask the server for a list of unused XIDs, so when xlib hits the end of the range it can request a new range from the server.

But. UI programming tends to want threads, and xlib is perhaps not the most thread-friendly. So XCB was invented, which sacrifices some of xlib's ease of use for a more direct binding to the protocol and (in theory) an explicitly thread-safe design. We then modified xlib and XCB to coexist in the same process, using the same I/O buffers, reply and event management, etc.

This literal reflection of the protocol into the API has consequences. In XCB, unlike xlib, XID generation is an explicit step. The client first calls into XCB to allocate the XID, and then passes that XID to the creation request in order to give the resource a name.

Which... sorta ruins that whole thread-safety thing.

Let's say you call xcb_generate_id in thread A and the XID it returns is the last one in your range. Then thread B schedules in and tries to allocate another XID. You'll ask the server for a new range, but since thread A hasn't called its resource creation request yet, from the server's perspective that "allocated" XID looks like it's still free! So now, whichever thread issues their resource creation request second will get BadIDChoice thrown at them if the other thread's resource hasn't been destroyed in the interim.

A library that was supposed to be about thread safety baked a thread safety hazard into the API. Good work, team.

How do you fix this without changing the API? Maybe you could keep a bitmap on the client side that tracks XID allocation, that's only like 256KB worst case, you can grow it dynamically and most clients don't create more than a few dozen resources anyway. Make xcb_generate_id consult that bitmap for the first unallocated ID, and mark it used when it returns. Then track every resource destruction request and zero it back out of the bitmap. You'd only need XC-MISC if some other client destroyed one of your resources and you were completely out of XIDs otherwise.

And you can implement this, except. One, XCB has zero idea what a resource destruction request is, that's simply not in the protocol description. Not a big deal, you can fix that, there's only like forty destructors you'd need to annotate. But then two, that would only catch resource destruction calls that flow through XCB's protocol binding API, which xlib does not, xlib instead pushes raw data through xcb_writev. So now you need to modify every client library (libXext, libGL, ...) to inform XCB about resource destruction.

Which is doable. Tedious. But doable.

I think.

I feel a little weird writing about this because: surely I can't be the first person to notice this.

No comments: