Can anyone tell this meme is true or false? I don’t have Gspy so I cannot test this

  • AlecSadler@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    35
    ·
    2 days ago

    It’s not true anymore, but Alexa’s used to only listen for specific keywords using a low-energy local-only chip.

    It has since changed, as stated, and I have to assume other vendors followed suit.

    • Pennomi@lemmy.world
      link
      fedilink
      English
      arrow-up
      21
      ·
      edit-2
      2 days ago

      As a specific example, the ESP32 chip does low power voice recognition for pre-trained trigger words. This lightweight recognition lacks the training to detect anything other than the list of trigger words that Espressif provides.

      Basically only battery-operated devices work this way (for power consumption reasons). If you’re plugged in you’re probably always running the high quality listening loop.

      • merc@sh.itjust.works
        link
        fedilink
        arrow-up
        2
        ·
        21 hours ago

        This is also why a lot of the wake words are similar:

        • Hey Siri
        • Alexa / Echo
        • OK Google / Hey Google

        Those all have different vowel sounds, hard consonants etc. because without that there’s not enough difference to make a unique wake word/phrase. Google needed something like “Hey” or “OK” before it because “Google” itself doesn’t generate enough unique sounds to act as a keyword. They’re also between 3 and 5 syllables because they need to be short enough to monitor for them, and long enough that they can be distinguished reliably from background noise.

        The sounds are converted into MFCCs, which is sort-of an extremely lossy form of compression. It was originally used to identify numbers, like when someone would call into an automated switchboard and they’d have to say “one” or “five”. It couldn’t identify complex words, just distinguish between a small set of very different sounding numbers.

        The way these systems work is that they’re running on a very low-power loop converting ambient sounds into these patterns and seeing if there’s a match for a wake-word pattern. The sound is converted into basically a time vs. frequency matrix and matched against the keyword / phrase. If there’s a match it unlocks the much more computationally-expensive voice transcription programs, otherwise it just throws out the data.

        You can tell that at least mobile devices aren’t always listening because if they were actually doing full-on voice transcription all the time, the battery would drain much faster. If they were doing off-device voice transcription, the antenna would have to stay on a lot more, which would also kill the battery, and it would be visible in your bandwidth bill.

        People need some more basic computer literacy. I get that the FAANG companies are “evil”, and want to do unscrupulous things with your data, but there’s often a simpler explanation that doesn’t involve massive privacy violations that security researchers would have caught long ago.

      • ArcaneSlime@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        4
        ·
        1 day ago

        Even in the first scenario, what stops there from being multiple wake words with different functionality? So like “ok google” wakes up the bot but “pepsi” wakes it silently and has it tick a box on the back end of a server that now sends me Coke adds because they paid about $3.50 for the privilege?

        • Pennomi@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          1 day ago

          Pretty much nothing. Any company with the resources of Google or Amazon could easily have their top 100 wake words trained into that model.