Google Is the Only Search Engine That Works on Reddit Now Thanks to AI Deal

Ideology [she/her]@hexbear.net · 2 years ago

Google Is the Only Search Engine That Works on Reddit Now Thanks to AI Deal

dannoffs [he/him]@hexbear.net · 2 years ago

Crazy that google paid for rights after reddit’s gone to shit. 3 years ago it would have been great, but these days I can’t remember the last time I clicked on a reddit link for an answer.

mkultrawide [any]@hexbear.net · 2 years ago

It’s because the Google algo is broken, and a good percentage of Internet users search “thing you want to know reddit” for everything now.

Ideology [she/her]@hexbear.net · 2 years ago

Would be neat if lemmy could form similar niche communities where people talk about their hobbies and scientific interests.

comrade_pibb [comrade/them]@hexbear.net · 2 years ago

pog poop balls lemmy

hit search

Robert_Kennedy_Jr [xe/xem, xey/xem]@hexbear.net · edit-2 2 years ago

It’s a little unnerving how many Hexbear posts/users show up using Yandex.

EllenKelly [comrade/them]@hexbear.net · 2 years ago

Yandex is legitimately great, especially image searching, simply breaks pintrest and the like so you can actually find high res copies of what youre looking for

kristina [she/her]@hexbear.net · 2 years ago

coolusername@lemmy.ml · 2 years ago

And it’s only like that because of the CIA. Around 2016 google started prioritizing MSM

Ideology [she/her]@hexbear.net · 2 years ago

I’m starting to wonder where people even get information anymore. Are libraries making a comeback?

Owl [he/him]@hexbear.net · 2 years ago

Discord, and I’m not happy about it.

mathemachristian [he/him]@hexbear.net · 2 years ago

Wtf how do you search discord?

Owl [he/him]@hexbear.net · 2 years ago

You search for the topic you’re interested in, find out there’s a discord about it, then open their discord, then check all their pins and try discord’s search function.

As I said, I’m not happy about it.

FunkyStuff [he/him]@hexbear.net · 2 years ago

Public discord servers should obviously not be a thing. It can be nice to have a place for people who are just getting started to casually be able to talk with more experienced people in some domain, I’ve definitely found it useful, but that’s something that can easily be achieved with a web forum. So public Discord servers just come with all the downsides of a forum: no expectation of privacy, power tripping moderators, bad search function, etc; then with none of the upsides: searchable by an external search engine, decentralized, familiar UI, configurability. And that’s without getting into all the issues with Discord as a company.

InevitableSwing [none/use name]@hexbear.net · 2 years ago

I’m starting to wonder where people even get information anymore.

Facebook, Twitter, and other cesspools.

glans [it/its]@hexbear.net · 2 years ago

i guess we can all start using out individual imaginations again

Thordros [he/him, comrade/them]@hexbear.net · 2 years ago

Facebook, Twitter, and other cesspools.

i guess we can all start using out individual imaginations again

You’re both correct!

Antiwork [none/use name, he/him]@hexbear.net · 2 years ago

The search engine was the imaginations we made along the way

BynarsAreOk [none/use name]@hexbear.net · 2 years ago

Youtube is a primary source now. A lot of people take information from some [insert cracker specialist on random topic here] specialy about social/economic topics but realy everything.

And to be fair there is also some actualy decent and informative channels too which makes it even worse because its not like everything on the internet/YT is false or wrong its just you picking idiot grifters as your source.

glans [it/its]@hexbear.net · 2 years ago

we are really in need to a viable FLOSS search engine. That can do its own indexing instead of repackaging google results like searx does. Maybe the spidering could be distributed somehow so the small self hosters could benefit from it while also able to apply their own standards, priorities, sorting etc.

hypercracker@hexbear.net · edit-2 2 years ago

unfortunately search is expensive in a way that FLOSS does not solve, it requires a lot of hosting infrastructure and boring volunteer labor to fine-tune results to combat spam (and spam might even benefit from looking at the FLOSS rules that filter it)

glans [it/its]@hexbear.net · 2 years ago

(and spam might even benefit from looking at the FLOSS rules that filter it)

idk i feel like that might be a problem which is created or at least greatly exaggerated by monopolies. if there was a diversity of search engines it would be much more difficult to do shitty SEO on all of them at the same time. You’d need a whole team combing through repo hosting sites and mailing lists to figure it out.

Owl [he/him]@hexbear.net · 2 years ago

Crawling could be distributed and shared, but indexing is a bigger problem.

All the things you’d want to be different on some sort of federated search platform (standards, priorities, and sorting, as you say) are things that require different indexing. But the index is the big expensive part that would most need to be shared.

thethirdgracchi [he/him, they/them]@hexbear.net · 2 years ago

https://search.marginalia.nu/ is open source, extremely good, and insanely resource efficient.

christian [he/him, any]@hexbear.net · 2 years ago

Thanks for this recommendation, I hadn’t heard of this. Looks promising.

glans [it/its]@hexbear.net · 2 years ago

Oh that looks cool. Do you run an install of it or are you using the install on their main page? Are there other instances

The hw requirements aren’t prohibitive. I mean it’s not nothing, maybe a few hundred upfront and then the connection. Well within reach especially with support of an existing organization who’d be willing to physically house it. I guess SSDs would be the largest part of the cost.

an x86-64 machine, have at least 16GB of RAM, and at least 4 cores. It is designed to run on physical hardware, and will likely be very expensive to run in the cloud.

Crawling requires a decent network connection, ideally at least 1 Gbps. 100 Mbps will work, but will be slower.

Storage requirements are highly dependent on the size of the index, and the number of documents being indexed. For 100,000 documents, you can probably get away with 2 TB of SSD storage, and 4 TB of mechanical storage for the crawl data.

I don’t know how far 100k documents gets you. It doesn’t sound like much if you are going for the whole internet but if you are curating a more narrow subset it could be enough.

This page has their philosophy and towards the bottom of the page, links to similar projects.

thethirdgracchi [he/him, they/them]@hexbear.net · 2 years ago

I just use their search page, not trying to run it on my own. But yeah it’s actually like possible for you to run, as opposed to something like Google.

emizeko [they/them]@hexbear.net · 2 years ago

death to google
death to reddit
death to america
a century of humiliation upon the first world

EmoThugInMyPhase [he/him]@hexbear.net · 2 years ago

Elsie@lemmy.ml · edit-2 2 years ago

Couldn’t a crawler just add a bypass for reddit’s robots.txt file?

comrade_pibb [comrade/them]@hexbear.net · 2 years ago

Yeah, can’t they just ignore it?

fox [comrade/them]@hexbear.net · 2 years ago

If you’re only making a few requests, yes, but it’s very easy to detect something like an indexing crawler.

O__O [none/use name]@hexbear.net · 2 years ago

deleted by creator

Elsie@lemmy.ml · 2 years ago

Or maybe they could make a browser extension like Bring Back YouTube Dislikes that will send the relevant metadata of the page back for indexing.

Grandpa_garbagio [he/him]@hexbear.net · edit-2 2 years ago

Anyone notice that Google Reddit searches got way worse like in the past few months? Used to be able to get away with “search query” but now it often gives whatever it considers synonyms even when in quotes.

Often the first few results are just completed unrelated. Like recently it considered a proper name of a city the synonym of the word district for me lol

AndJusticeForAll [none/use name]@hexbear.net · 2 years ago

deleted by creator

Ideology [she/her]@hexbear.net · 2 years ago

@yogthos@lemmygrad.ml this looks like something you would post.

InevitableSwing [none/use name]@hexbear.net · 2 years ago

Google is now the only search engine that can surface results from Reddit…

I’m terrible at proofreading so if I can nearly immediately find a mistake - the website isn’t even trying.

PointAndClique [they/them]@hexbear.net · edit-2 2 years ago

What’s the mistake? Is it surface/service? Because I’ve seen surface used increasingly to mean ‘reveal/show up/present’ (i.e. 'bring to the surface)

MedicareForSome [none/use name]@hexbear.net · 2 years ago

What is the mistake?

hexaflexagonbear [he/him]@hexbear.net · 2 years ago

Lol terrible

Titou [she/her]@hexbear.net · 2 years ago

All my homies hate reddit.

MedicareForSome [none/use name]@hexbear.net · 2 years ago

Searx is an alternative meta-search engine that can include google results. If you don’t want to host an instance you can find one here: https://searx.space/

I like https://searx.work/ personally.

Tomorrow_Farewell [any, they/them]@hexbear.net · 2 years ago

A thing I found annoying about searx is that it’s instances sometimes stop working, so I have to switch to other ones.

Spiteful_Gremlin@lemmy.ml · 1 year ago

I like the customization options for Searx, but I had to eventually switch back to Duckduckgo/Startpage. Sometimes an engine would start timing out or requesting captchas, or an instance would go down and I’d have to add a new one to my firefox desktop/mobile, or I’d notice that the results I got through an instance via one engine or another would be very different from the same search ran on said engine directly. I feel bad, but I just don’t have the time or patience for it.

Lemmygradwontallowme [he/him, comrade/them]@hexbear.net · 2 years ago

Ah putain…

HexReplyBot [none/use name]@hexbear.net · 2 years ago

I found a YouTube link in your post. Here are links to the same video on alternative frontends that protect your privacy:

AndJusticeForAll [none/use name]@hexbear.net · 2 years ago

Internet slowly bundling together.