WebSocket (python server, javascript/rxjs client) missing messages

Introduction

I'm trying to provide as much info up front, so if I've missed something, please let me know. Downvoters/Closers: please explain to me so I can fix the question.

Motivation: server does the work (asyncio with multi-processing), client just requests and presents data. WebSockets (herein WS) used primarily to minimize latency of requests and replies (connection protocols done up front, and TLS connection done once). WS uses TCP, which has produced sufficiently transparent transactions...no need for UDP or anything for greater speed.

Setup

  • WS server: Python/Daphne/Django Channels (updated)
  • WS client: Javascript/Angular/Rxjs (updated)
  • Both running on localhost (no physical throughput contention, )

Applications

  • Client: Angular, primarily requesting and formatting data for presentation
  • Server 1: Django channels async JSON consumer (asyncio framework...maybe important later).
  • Server 2: Python using WebSockets pypi asyncio package (currently legacy...will address later...confirmed not the issue).

Server 1 instance connects to server 2 TLS WS connection.

Problem

I implemented the servers a few years ago. Long story short: I was seeing some aspects of latency in places I didn't expect when using await with the Django ORM. Short answer? I just switched the DB connections from database_sync_to_async to wrapping the calls in asyncio.to_thread. The concurrency experienced has improved by a factor of 10 without a doubt, as I was doing many DB calls at the start an end of a transaction, sending updates to the Javascript interface. Moving these calls from asyncio/await to threads has resolved this.

Until this point, I wasn't seeing any data anomalies, and it never occurred to me that there could be any. I'm now seeing that the throughput of my entire server architecture is solid and verified...I tracked all messages through the system, and can see that a string of requests get processed, producing datasets a few megabytes in size, and sends them back over the last WS Rxjs connection to the JS interface. I can see all messages have been "sent" by the Daphne/Django channels async consumer, they are done promptly, and all data is present.

However, I've noticed that about 85% of the time, the JS UI will be missing between 1 to 12 replies from the end of the transmitted sequence from the server 1 server, and they never show up. This is out of about 42 relevant message requests initiated by the JS UI and submitted over WS to server 1. The Python servers all queue the data appropriately, and as I mentioned, I have tracked each message delivery as far as I can before they disappear after being sent by server 1.

Solutions

None so far. Reverting the code back to "slow" using async/await database transactions obviously slows things down, and seems to resolve it. It seems clear to me that it has something to do with queues and buffering in the WS connection between server 1 and the JS UI. A series (10-15) of small (kB) and large (MB) messages sent in a period of about 1.5 seconds seems to cause the most grief. However, it's not consistent: different messages go missing, and occasionally the messages are all received. I retest just by refreshing the page (new browser session, new WS connection, new worker processes on the server, etc) and I'll see different behaviour.

I'm trying to dig a little closer to the server 1 WS TCP layer to validate that the packets are actually handled at the TCP layer...I'm not done that validation. I need to find out how to look at the TCP sessions on the JS UI in the browser.

Other attempts

I thought maybe my Rxjs WS handler in the browser was taking too long to handle messages, so I tried adding a "buffer" using an Rxjs Subject which would allow the my WS handler to hand it over to the parsing/formatting processing by just nexting the data into the observable, and subscribing to that elsewhere in the WS service but that didn't seem to work. I need to try a sending queue with delay on the server 1 application as my first attempt using asyncio.sleep to see if artificially slowing it down will improve things.

Ideas

Does anyone know how to dig a little further down the Rxjs WebSocket functions so I can find out if buffers are filling and messages are being dropped? TCP should ensure the delivery of the message, so I feel like the issue is on the browser side. I'm using browser because I want to for my application so that won't disappear.

Вернуться на верх