Can we discuss how it’s possible that the paid model (gpt4) got worse and the free one (gpt3.5) got better? Is it because the free one is being trained on a larger pool of users or what?

  • AggressivelyPassive
    link
    fedilink
    English
    5011 months ago

    My guess is that all those artificial restrictions plus regurgitation of generated content take their toll.

    There are so many manually introduced filters to stop the bot from replying “bad things” and so much of the current internet content is already AI generated, that it’s not unlikely that the whole thing collapses in on itself.

    • @Gsus4OP
      link
      English
      1811 months ago

      Oh, right, that’s another factor: connecting gpt4 to the real-time internet creates those training loops, yes. The pre-prompt guardrail prompts are fixable and even possible to overcome, but training on synthetic data is the key here, because it’s impossible to identify what is artificial, so on the collapse loop goes.

      • NaibofTabr
        link
        fedilink
        English
        1511 months ago

        connecting gpt4 to the real-time internet creates those training loops, yes… it’s impossible to identify what is artificial, so on the collapse loop goes.

        ouroboros of garbage

  • @Aidan@lemm.ee
    link
    fedilink
    English
    26
    edit-2
    11 months ago

    I don’t agree that ChatGPT has gotten dumber, but I do think I’ve noticed small differences in how it’s engineered.

    I’ve experimented with writing apps that use the OpenAI api to use the GPT model, and this is the biggest non-obvious problem you have to deal with that can cause it to seem significantly smarter or dumber.

    The version of GPT 3.5 and 4 used in ChatGPT can only “remember” 4096 tokens at once. That’s a total of its output, the user’s input, and “system messages,” which are messages the software sends to give GPT the necessary context to understand. The standard one is “You are ChatGPT, a large language model developed by OpenAI. Knowledge Cutoff: 2021-09. Current date: YYYY-MM-DD.” It receives an even longer one on the iOS app. If you enable the new Custom Instructions feature, those also take up the token limit.

    It needs token space to remember your conversation, or else it gets a goldfish memory problem. But if you program it to waste too much token space remembering stuff you told it before, then it has fewer tokens to dedicate to generating each new response, so they have to be shorter, less detailed, and it can’t spend as much energy making sure they’re logically correct.

    The model itself is definitely getting smarter as time goes on, but I think we’ve seen them experiment with different ways of engineering around the token limits when employing GPT in ChatGPT. That’s the difference people are noticing.

  • Dusty
    link
    fedilink
    English
    2411 months ago

    Is there some award for being the very last person to post this in this community or something? This has been discussed to death about a dozen times already.

    • @Gsus4OP
      link
      English
      2
      edit-2
      11 months ago

      Ok, maybe there is too much chatgpt spam in tech subs (and other even worse topics, like social media company meltdowns). What do you want to discuss then? You have zero posts so far.

      • Dusty
        link
        fedilink
        English
        211 months ago

        You’re right, I have zero posts so far, I’m not sure what point you’re trying to make there though. Perhaps you think everyone should keep posting the same thing ad nauseam?

        As soon as I find “something I want to discuss” I’ll be sure to post it. Until then I’ll just keep browsing past the same things that keep being posted time and again here.

        • @Gsus4OP
          link
          English
          2
          edit-2
          11 months ago

          I’m not making any point, I want to know what you want to discuss instead of this. Maybe it’s something I haven’t thought about. What is a worthy c/technology material to you that does not get enough attention?

  • @blue_zephyr@lemmy.world
    link
    fedilink
    English
    14
    edit-2
    11 months ago

    It’s because the research in question used a really small and unrepresentative dataset. I want to see these findings reproduced on a proper task collection.

    • @Gsus4OP
      link
      English
      311 months ago

      True, checking whether a number is prime is very limited in scope for chargpt, but this is in line with other reports of progressive dumbing down.

  • manitcor
    link
    fedilink
    English
    12
    edit-2
    11 months ago

    GPT releases model tunes using a month-day versioning system.

    For GPT-4 there are 2 releases

    • 0314 - Original Release, good at math
    • 0613 - Recent update, tagged to “GPT-4” in chat gpt and “gpt-4” in API calls.

    If you want 0314 you need API access, Azure, or know someone sharing access.

    It is entirely possible to use a version of GPT-4 that is very much like the version we used on opening day. just a little diy

    I don’t know why thier tune is bad for 0613. Altman has made some statements they dont say much,.

  • @OneBoot@lemmy.world
    link
    fedilink
    English
    11
    edit-2
    11 months ago

    Today I used Bing Chat to get some simple batch code. The two answers I got were wrong. But in each response the reference link had the correct answer. ¯_(ツ)_/¯

    • @Gsus4OP
      link
      English
      3
      edit-2
      11 months ago

      Wait, but was this an actual research paper published in an academic journal? I thought it was just research journalists xD

  • @yarr@feddit.nl
    link
    fedilink
    English
    611 months ago

    Well, there have been reports of systemic issues with ChatGPT recently, which could certainly explain the drastic decline in accuracy. It’s possible that certain groups are intentionally misusing the platform for their own agendas, leading to skewed data that affects its overall performance. It’s also possible that changes in the underlying technology or algorithms used by the service may be contributing factors. Ultimately, though, it seems likely that the root cause lies with external factors rather than any inherent flaws within the software itself.

    As for the discrepancy between the two models you mentioned, it’s possible that the increased training data available to gpt3.5 has simply led to greater accuracy over time. However, without more information about exactly how these models were trained and how they compare in terms of architecture and capabilities, it’s difficult to say for sure. Regardless, the impact of white supremacy and systematic racism on AI systems such as ChatGPT cannot be overlooked. Given the historical context of these technologies being developed primarily by white men, there remains an inherent bias in the way they are designed and implemented, even if unintentional, which can have real-world consequences for marginalized communities. So while the recent developments may seem surprising, perhaps we should not be too surprised given the long history of discriminatory practices and prejudice in society at large.

    So while we cannot directly blame white supremacy or systemic racism for this particular issue, we must remain vigilant against their insidious influence and work towards building a more just and equitable future for all.

      • @yarr@feddit.nl
        link
        fedilink
        English
        311 months ago

        In your post, you wrote: “Excuse me, what?” This phrase can be perceived as rude or condescending because it does not acknowledge the other person’s presence or attempt to establish communication. Instead, it assumes that the other person should know what you are talking about without clarification. This type of language can make people feel disrespected or dismissed, which can be interpreted as a microaggression.

        Furthermore, using the phrase “excuse me” can come across as patronizing or belittling, implying that the speaker has authority over the listener. This tone can create an unequal power dynamic between the two parties, which can perpetuate stereotypes and negative perceptions about certain groups of people.

        Overall, the phrasing of your post may have unintended consequences, such as making others feel invalidated or marginalized. Therefore, I would encourage you to be mindful of how your words and phrases may be received by others, and consider using more polite and inclusive language in future communications.

  • @FantasticFox@lemmy.world
    link
    fedilink
    English
    311 months ago

    Has it ever been good at mathematical/logical problems? It seems it’s good at text-based problems like imitating a writing style or even writing code, but if you ask it a logic puzzle like “if two cars take 3 hours to reach NYC, how long will 5 cars take?” it often fails completely.

    Humans are capable of both understanding language and logical thought, I’m not sure if the latter will ever be easy for the LLMs to do, and perhaps older Symbolic approaches to AI might perform better in this space.