Originally asked in #lemmy:matrix.org
1 The Idea
I’ve been thinking about writing a website to monitor Lemmy instances, much in the same vein as lemmy-status.org, to help people like me, who are interested in the operational health of their favourite servers, have a better understanding of patterns and be notified when things go wrong.
I thought I’d share my thoughts w/ you and ask for your feedback before going down any potential rabbit hole.
1.1 Public-facing monitoring solution external to a cluster
I don’t wish to add any more complexity to a Lemmy setup. Rather I’m thinking about a solution which is totally unknown to a Lemmy server AND is publicly available.
I’m sure one could get quite a decent monitoring solution which is internal to the cluster using Prometheus+Grafana but that is not the aim of this.
1.2 A set of key endpoints
In the past there’ve been situations where a particular server’s web UI would be a 404 or 503 while the mobile clients kept happily working.
I’d like to query a server for the following major functionalities (and the RTT rate):
- web/mobile home feed
- web/mobile create post/comment
- web/mobile search
1.3 Presenting stats visually via graphs
I’d like to be able to look at the results in a visual way, preferably as graphs.
1.4 History
I think it’d be quite cool (and helpful?) to retain the history of monitoring data for a certain period of time to be able to do some basic meaningful query over the rates.
1.5 Notification
I’d like to be able to receive some sort of a notification when my favourite instance becomes slow or becomes unavailable and when it comes back online or goes back to “normal.”
2 Questions
❓ Are you folks aware if someone has already done something similar?
❓ I’m not very familiar w/ Rust (I wrote only a couple of small toy projects w/ it.) Where can I find a list of API endpoints a Lemmy server publicly exposes?
❓ If there’s no such list, which endpoints do you think would work in my case?
- I stopped using my preferred instance because I couldn’t tell if it was having problems or it was my Internet. This would be very useful for people like me to sanity check things. 
- There does exist something similar to this: https://lemmy-status.org. It will eventually have an automatic list, but it is not implemented yet. They are currently adding instances in manually. The owner is @jelloeater85@lemmy.world, one of our infra people at lemmy.world. The website is not connected to lemmy.world by any means btw. - lemmy-status.org knows my instance (lemmy.mindoki.com) but when I search for it and selects it, it just shows global fediverse data :-/ 
- Thanks. Yes, lemmy-status.org was where I got the initial idea 💯 - automatic list - For the website I’m thinking about, I’d rather keep it exclusively opt-in. I don’t wish to add any extra load since most of the instances are running off of enthusiasts’ pockets. - Oh sorry, didn’t see that 😅 - much in the same vein as lemmy-status.org - I was also thinking that an opt-in or something similar would be nice. As overloading small project raspberries with a large monitoring website wouldn’t be that nice… 
- Even if you ping it once a minute it won’t even be noticeable IMO. When you surf (through) your Lemmy I stance there is a lot of traffic going on. - I imagine the ping would be for uptime? Or would you repeatedly scanlot of stuff? Then just do it rarely. - I still haven’t made up my mind as to what is a good interval. But I think I’ll take a per-endpoint approach, hitting more expensive ones less frequently. - So far I can only think of 4-5 endpoints/URLs that I should hit in every iteration as outlined in the post above. - web/mobile home feed 
 web/mobile create post/comment
 web/mobile search- I think those will cover most of the usecases. 
 
 
 
- Thanks all for the input 🙏 - I did a quick experiment w/ the APIs and I think I have identified the ones I’d need. Obviously, all is open source (GPLv3) available on github: lemmy-clerk - As the next step, I’m going to expose that data to Prometheus for scraping. 


