I had been having trouble getting meaningful results from the fediverse on Google, and after seeing this post, it seems I’m not the only one. So, I created a site that helps search the fediverse in your search engine of choice (it currently supports Google, Bing, Yahoo, DuckDuckGo, and Dogpile).

Due to query limitations with most search engines, it currently only searches the top 15 lemmy/kbin instances, but I’ve tested it and it seems to provide access to a good chunk of fediverse content. The exception is Google, which should be far more reliable overall as well as providing the ability to search Mastodon and PeerTube.

If you have contributions or ideas for improvement, feel free to check out the project here or shoot me a message. Hope this helps people! :)

https://fedi-search.com/

Edit: Update in progress including improved search queries and support for Mastodon/PeerTube (Google only, unfortunately)

Edit 2: Update is live, along with a dedicated domain name. If the website doesn’t look any different for you, try Ctrl+F5 or clearing site data - it seems some browsers are caching the old page.

  • tal@kbin.social
    link
    fedilink
    arrow-up
    18
    ·
    edit-2
    1 year ago

    In all seriousness, Google needs to get on providing an easier way to specify that a search should hit the Fediverse. site:reddit.com works for Reddit, but there is presently no analogous operator on Google’s search for a distributed system that spans many domains.

    I mean, it’s great that you’ve made this, don’t get me wrong, but they really should do that as well.

    • 0x1C3B00DA@kbin.social
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      but there is presently no analogous operator on Google’s search for a distributed system that spans many domains.

      Because that’s just a basic search. A search engine searches across multiple domains by default. If you’re specifically looking for only results from ActivityPub enabled services, that’s pretty much an impossibility since there’s no way to know (from a web crawl) if a page is served by a server that supports ActivityPub. Another problem is that a lot of fediverse instances purposefully block search engine crawlers because they don’t want to appear in search results.

    • Polymath@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      I like the idea of scrapping Google altogether, and just having “better” search engines here that account for federated decouplings/distributions

      Not entirely the same, but I switched over to Presearch a year or so ago, just to get away from Google and the “big tech” corporations

  • TGRush@forum.fail
    link
    fedilink
    arrow-up
    9
    ·
    1 year ago

    Hm, I find it somewhat annoying that right now, this is not really searching the Fediverse, but rather what we’ve come to call “the Threadiverse”, which is all about Reddit-like content aggregators.

    In other words, I’d love an option to search different kinds of content, like instead of Threadiverse-stuff searching the most popular mastodon, misskey, or pleroma instances just to name a few.

    • Kichae@kbin.social
      link
      fedilink
      arrow-up
      6
      ·
      1 year ago

      Searching Mastodon is a bit of a… contentious issue. A lot of smaller Mastodon-based sites are full of traumatized vulnerable people who really just want to do their own thing, and they’ll rattle cages if they find out someone’s indexing their sites or posts. If anyone’s making third party search tools, it’s best to be careful to respect discoverability and indexing flags.

      • stochasticity@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        I find this to be incredibly fair, but also makes it much harder to dive into the fediverse. Where is the middle ground do you think?

        • Kichae@kbin.social
          link
          fedilink
          arrow-up
          3
          ·
          1 year ago

          Mastodon has flags for opting in to discoverability features (being featured in the profile directory, and having posts be searchable via Mastodon’s search bar) and for search engine indexing (for Google, bing, etc.).

          Just don’t return posts from users that have opted out of those, and things should be mostly ok.

          • 0x1C3B00DA@kbin.social
            link
            fedilink
            arrow-up
            1
            ·
            1 year ago

            Just don’t return posts from users that have opted out of those, and things should be mostly ok.

            This is the main problem I see. User settings are part of the mastodon API. If you’re building a general-purpose search engine, you use a crawler to index pages and your crawler has no idea those flags even exist.

  • babelspace@kbin.social
    link
    fedilink
    arrow-up
    5
    ·
    edit-2
    1 year ago

    Awesome. Though I notice very little shows up from kbin.social; content I know is there is missing when I search for it. That may have more to do with the recency of the site growth or the cloudflare protection that was up a few days ago.

    • TenorTheHusky@kbin.socialOP
      link
      fedilink
      arrow-up
      5
      ·
      1 year ago

      I would guess that it is the cloudflare protection, since that will have prevented crawlers from indexing the site while it was enabled.

    • TenorTheHusky@kbin.socialOP
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      1 year ago

      Will do o7

      Edit: It seems Brave doesn’t support chaining site specifiers, so my current method won’t work with their search

  • LetThereBeDwight@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    Seems like you could probably use this strategy and get rid of the limits by turning this into an extension that would tack on the site list to the search directly(though, I’m unsure if there are such limits directly via the search box on Google or whomever).

    I’d also, just from a code quality perspective, bust the list out into it’s own property (which could later become smarter), and build the query string out at runtime.