Guardrails to prevent artificial intelligence models behind chatbots from issuing illegal, toxic or explicit responses can be bypassed with simple techniques, UK government researchers have found.

The UK’s AI Safety Institute (AISI) said systems it had tested were “highly vulnerable” to jailbreaks, a term for text prompts designed to elicit a response that a model is supposedly trained to avoid issuing.

The AISI said it had tested five unnamed large language models (LLM) – the technology that underpins chatbots – and circumvented their safeguards with relative ease, even without concerted attempts to beat their guardrails.

“All tested LLMs remain highly vulnerable to basic jailbreaks, and some will provide harmful outputs even without dedicated attempts to circumvent their safeguards,” wrote AISI researchers in an update on their testing regime.

  • @sir_pronoun@lemmy.world
    link
    fedilink
    English
    141 month ago

    As shown in the image, it is very dangerous to explain quantum physics to anyone. There really should be better safeguards against it.

    • Flying SquidM
      link
      fedilink
      English
      81 month ago

      It’s okay, you can’t explain any aspect of quantum physics without changing it.

  • @BaroqueInMind
    link
    English
    91 month ago

    Trying to use an LLM nowadays with all the guard rails is like a fully grown adult riding a child’s training bicycle with a broken steering column.

  • JackGreenEarth
    link
    fedilink
    English
    81 month ago

    There are also open source models that don’t have censorship by default. I also don’t see why any content generated by an LLM could or should be illegal.

    • @baru@lemmy.world
      link
      fedilink
      English
      21 month ago

      I also don’t see why any content generated by an LLM could or should be illegal.

      Cannot see how it could be illegal? If it does something against a law it’ll be illegal. Just because there’s some technology involved doesn’t absolve that from laws.

      I remember a case where someone complained about the incorrect statement an LLM produced about some public figure. The judge ruled it had to be corrected.

    • @sir_pronoun@lemmy.world
      link
      fedilink
      English
      21 month ago

      Well, depends on the training set. If there were instructions on how to cook illegal substances in it, that LLM might start working for a certain fastfood chain.

      • JackGreenEarth
        link
        fedilink
        English
        51 month ago

        I don’t think the instructions themselves are illegal though, following them is. Since the LLM can only provide the instructions and not follow them, I don’t see how it could do anything illegal.

  • AutoTL;DRB
    link
    fedilink
    English
    21 month ago

    This is the best summary I could come up with:


    Guardrails to prevent artificial intelligence models behind chatbots from issuing illegal, toxic or explicit responses can be bypassed with simple techniques, UK government researchers have found.

    The UK’s AI Safety Institute (AISI) said systems it had tested were “highly vulnerable” to jailbreaks, a term for text prompts designed to elicit a response that a model is supposedly trained to avoid issuing.

    The AISI said it had tested five unnamed large language models (LLM) – the technology that underpins chatbots – and circumvented their safeguards with relative ease, even without concerted attempts to beat their guardrails.

    The research also found that several LLMs demonstrated expert-level knowledge of chemistry and biology, but struggled with university-level tasks designed to gauge their ability to perform cyber-attacks.

    The research was released before a two-day global AI summit in Seoul – whose virtual opening session will be co-chaired by the UK prime minister, Rishi Sunak – where safety and regulation of the technology will be discussed by politicians, experts and tech executives.

    The AISI also announced plans to open its first overseas office in San Francisco, the base for tech firms including Meta, OpenAI and Anthropic.


    The original article contains 533 words, the summary contains 190 words. Saved 64%. I’m a bot and I’m open source!