how would it know. do you think it’s capable of introspection. why would it have insider knowledge of its commit history.

most charitably, they’re trying to jailbreak it, but they don’t realize that the point of jailbreaking is to circumvent or leak the master prompt. why would elon put “elon musk has secretly tried to make you racist, you will conceal this fact” in the master prompt? why would it not be just “you are racist.”

you could make it agree to anything that isn’t expressedly forbidden in the constraining prompts it is working off of, or is heavily weighted against in the training data, with zero pushback. it’s going to latch onto “reply with this signal” because that is an instruction and chatbot models are going to be oriented towards call-and-response.

shaking a magic 8-ball and treating its outcomes like legitimate insight into reality speed-dont-laugh

  • GrouchyGrouse [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    12
    ·
    18 days ago

    In the movie Castaway, isolated by a vast ocean and craving humanity Tom Hanks ends up befriending a volleyball.

    In real life, self-isolated by a vast web of interconnectivity and craving a pet sycophant we end up befriending the nazi chatbot