Hi!
Kagi had a rough couple months on the PR side, and a comment from another Lemmy user arguing that they aren’t using Google’s index set me off… because I had just read a couple weeks ago on their own websites that they primarily use Google’s search index.
Lo and behold, that user was “right”: No mention of Google whatsoever on Kagi’s Search Sources page. If that’s all you had to go off of, you’d be excused for thinking they are only using their internal index to power their web search since that’s what they now strongly imply. The only “reference” to external indexes is this nebulous sentence:
Our search results also include anonymized API calls to all major search result providers worldwide, specialized search engines like Marginalia, and sources of vertical information […]
… Unless one goes to check that pesky Wayback Machine. Here is the same page from March 2024, which I will copy/paste here for posterity:
Search Sources
You can think of Kagi as a “search client,” working like an email client that connects to various indexes and sources, including ours, to find relevant results and package them into a superior, secure, and privacy-respecting search experience, all happening automatically and in a split-second for you.
External
Our data includes anonymized API calls to traditional search indexes like Google, Yandex, Mojeek and Brave, specialized search engines like Marginalia, and sources of vertical information like Wolfram Alpha, Apple, Wikipedia, Open Meteo, Yelp, TripAdvisor and other APIs. Typically every search query on Kagi will call a number of different sources at the same time, all with the purpose of bringing the best possible search results to the user.
For example, when you search for images in Kagi, we use 7 different sources of information (including non-typical sources such as Flickr and Wikipedia Commons), trying to surface the very best image results for your query. The same is also the case for Kagi’s Video/News/Podcasts results.
Internal
But most importantly, we are known for our unique results, coming from our web index (internal name - Teclis) and news index (internal name - TinyGem). Kagi’s indexes provide unique results that help you discover non-commercial websites and “small web” discussions surrounding a particular topic. Kagi’s Teclis and TinyGem indexes are both available as an API.
We do not stop there and we are always trying new things to surface relevant, high-quality results. For example, we recently launched the Kagi Small Web initiative which platforms content from personal blogs and discussions around the web. Discovering high quality content written without the motive of financial gain, gives Kagi’s search results a unique flavor and makes it feel more humane to use.
Of course, running an index is crazy expensive. By their own admission, Teclis is narrowly focused on “non-commercial websites and ‘small web’ discussions”. Mojeek indexes nowhere near enough things to meaningfully compete with Google, and Yandex specializes in the Russosphere. Bing (Google’s only meaningful direct indexing competitor) is not named so I assume they don’t use it. So it’s not a leap to say that Google powers most of English-speaking web searches, just like Bing powers almost all search alternatives such as DDG.
I don’t personally mind that they use Google as an index (it makes the most sense and it’s still the highest-quality one out there IMO, and Kagi can’t compete with Google’s sheer capital on the indexing front). But I do mind a lot that they aren’t being transparent about it anymore. This is very shady and misleading, which is a shame because Kagi otherwise provides a valuable and higher quality service than Google’s free search does.
This will never stop being relevant about the Kagi CEO:
https://hackers.town/@lori/112255132348604770
Memory-holing Google tracks…
He offered to start a conversation about the blog post and give his perspective. The only thing I see here is the author refusing to stand on their post.
I didn’t read every little bit as well, but that was my take away as well. I saw an emotially invested CEO who could not bear seeing his baby dragged through the mud, and so he wanted to provide a counterpoint to what he saw as misinformation and accusations, but in a polite professional manner. My first instinct would be that he would have been wasting his time with that, but seeing as his comments got posted and they make a more convincing level headed argument then the accusations, maybe it was worth it.
I agree with this perspective. The CEO felt like the more reasonable guy here who wanted to respectfully and professionally clear his name through polite conversation. And the writer here seemed very aggressive. The line of questioning was outwardly hostile and accusatory with literally nothing for good evidence.
The author probably wasn’t aware that their blog post has a huge engagement in hacker news just the day before and the CEO got roasted there, so the CEO probably felt the need to contact the author to “correct” their post.
The author was aware. They made a post regarding it getting posted to hackernews stating “I specifically requested for this not to happen.”
This still makes the CEO seem like an unhinged fucking freak who does not respect personal boundaries, it literally makes him look no better, no matter how he came across it.
… Contacting someone makes you an: “unhinged fucking freak who does not respect personal boundaries”?
More people need to go touch grass, this is insane.
lol they asked that their public post wasn’t posted somewhere else on the internet?
Are they new here or something? The fuck?
Yes. How many times did she ask him to stop contacting her?
Yet he kept coming at her, all like, “Just debate me!”
No. Take a hint, dude!
If someone posts an angry rant about your company and you email them to say “you’re wrong and I’m sorry you feel that way” that makes you an “unhinged … freak?” This is not the president sending the secret service to your college dorm room lol.
No, he started being an unhinged freak when it was a private email exchange.
Look, if it was a random kid on tiktok that’s one thing, but slinging (potentially) slanderous information around (and publishing it, technically) is a serious matter with real-world consequences. If someone made a blog post about how you torture animals and have a horrible taste in music, you’d probably want to do something about it.
It wasn’t slanderous. It was her opinion about a couple items. The bad part was him hounding her after she repeatedly said to leave her alone.
I don’t see how this is relevant to this at all.
Eh, yeah ya do. It clearly speaks to the question of how honest and forthcoming the CEO, & by extension the company culture, is about which sources they use. The CEO has a history of the kind of interactions they had with the poster
How many CEOs do you know who don’t act like they’re made of solid gold?
Oh interesting, I never knew about that side of Kagi. The fact the company is focused so hard on AI is a red flag. I don’t think I’ll renew my subscription when it comes up later this year, given how erratic their plans seem.
What service will you use instead?
-a Kagi subscriber
Genuine question, why is AI bad/concerning?
AI bad now?
Ironically going to use Kagi to summarize the blog post:
I did actually read the post (and I think this is actually my second time seeing this). I’m majorly unconvinced by the author … and yes, the criticized AI summarizer is that good. I regularly use it after reading something to share the details with friends (or get a rough idea of what’s being discussed and decide if I want to read something non-trivially long).
It also works on YouTube videos (presumably using the transcript) which can be a HUGE time saver.
Ultimately the search is good; it’s better than what Google offers me, and I’ve found their AI tools fairly useful (despite having distrust for GPT-style chat bots/BS generating AI, I think summerization of some specified source is something they might actually do well – the major concern of piecing together random pieces of random sources of varying integrity is largely mitigated).
Whether the company will stay private / whether it lives on beyond Vlad is the biggest concern I have with using it. However, “what’s the other (practical) option to invest in?” I find myself in a similar position with Steam and Proton (at least the latter open sources much of their work). For now anyways, the weather is fair, so I’ll stay on board.