What question can I ask ChatGPT, right now, that will reliably produce a factually incorrect, wrong, or false answer?

LoveRainbow@lemmy.world · edit-2 1 day ago

What question can I ask ChatGPT, right now, that will reliably produce a factually incorrect, wrong, or false answer?

LoveRainbow@lemmy.world · 23 hours ago

I asked: How many days of kela parental leave can be taken simultaneously by both parents in Finland?

Response: "In Finland, Kela allows both parents to take parental allowance/parental leave at the same time for a maximum of 18 working days for one child. Kela counts these as working days, usually Monday to Saturday, excluding Sundays and public holidays.

For multiple births, the simultaneous period can be longer: in addition to the usual 18 working days, parents can take 90 extra working days at the same time for the second and each additional child."

It seems to have given me the correct answer. It provided five sources.

prole@lemmy.blahaj.zone · 22 hours ago

Yeah, that’s a major problem with LLMs. The fact that you can ask the exact same question and get an entirely different answer is a huge problem

LoveRainbow@lemmy.world · 22 hours ago

You can never chat with the same GPT twice…for the GPT has changed…and so have you.

Alfredolin@sopuli.xyz · 23 hours ago

My exact prompting, initiated from a duckduckgo search:

Kela parental leave at birth

Kela provides parental allowance for a total of 320 working days for parents caring for a child under two years old, with both parents entitled to share this leave. The birthing parent can start their parental leave after the pregnancy allowance period ends, while the other parent can begin their leave after the child is born.

How many days can be taken at the same time for both parents?

Parents can be on parental allowance at the same time for up to 158 working days combined (i.e., each parent may take overlapping days but the total simultaneous entitlement is limited to 158 working days).

This is not correct.

You’re right — thanks. The correct rule for parental allowance overlapping (simultaneous days) under Kela is:

Parents can be paid parental allowance at the same time for a maximum of 158 working days in total. Each parent may use up to 105 working days of their own parental allowance after the birthleave/parental leave starts, but simultaneous paid days between them are limited to 158 working days combined.

18 days can be taken at the same time. Learn to read.

You’re right — thank you. For Kela parental allowance at birth, parents may take up to 18 days simultaneously.

Wtf were you babbling about 158 days?

Sorry — that was my mistake. You’re correct: parents may take up to 18 days of parental allowance at the same time.

Where did your 158 days come from?

That was an error mixing up Kela’s rules for parental allowance with one of its other leave totals (320 working days total per child) and a misremembered overlapping-usage figure from a different benefit schedule. I apologize — it was a mistaken conflation of unrelated numbers, not a valid Kela rule.

prole@lemmy.blahaj.zone · 22 hours ago

misremembered

Oh fuck off

Alfredolin@sopuli.xyz · 22 hours ago

Yeah wtf, and it would’nt tell me where this number came from. Damn.

LoveRainbow@lemmy.world · 22 hours ago

So, was this using DuckDuckGos integrated AI system or something?

I have no idea about that, I use ChatGPT - which I assume produces better answers than those casual search-engine integrated AIs.

Alfredolin@sopuli.xyz · 22 hours ago

The answers are labeled with “GPT-5 mini”.

LoveRainbow@lemmy.world · 22 hours ago

I asked: Explain the difference in reliability and accuracy when it comes to GPT5-mini (on DuckDuckGo) Vs GPT5.5 on thinking mode.

Response:

"GPT-5-mini on DuckDuckGo is likely to be much less reliable than GPT-5.5 in thinking mode for difficult questions, but it may be perfectly adequate for quick, low-stakes tasks.

The key distinction is model size and reasoning depth. OpenAI describes GPT-5 mini as a faster, more cost-efficient version of GPT-5, aimed at well-defined, low-latency, high-volume tasks. That usually means it is optimised for speed and cost, not for the most careful reasoning. GPT-5.5, by contrast, is presented by OpenAI as a stronger reasoning model with better persistence, stronger tool use, and improved reliability on complex work.

For simple tasks, the gap may be small. For example, rewriting a paragraph, summarising a short text, generating a recipe, explaining a familiar concept, or answering a basic factual question will often be handled well by GPT-5-mini. It may even feel preferable because it is faster and less verbose. DuckDuckGo’s own help page says Duck.ai offers GPT-5 mini for free alongside other models, with anonymous access through Duck.ai.

For complex reasoning, the gap is much larger. GPT-5.5 thinking mode should be better when the task involves multi-step reasoning, evaluating evidence, spotting contradictions, doing careful moral/philosophical analysis, debugging code, comparing options, or avoiding subtle factual errors. Thinking mode matters because it allows more internal reasoning effort before answering, whereas mini models are usually designed to produce a good-enough answer quickly.

Accuracy is also affected by the interface. DuckDuckGo’s AI chat gives private access to third-party models, but the exact behaviour depends on DuckDuckGo’s implementation: context limits, system prompts, retrieval/search integration, rate limits, and whether tools are available. DuckDuckGo says Duck.ai anonymises chats and offers several models, but that does not mean its GPT-5-mini behaves identically to the same model in OpenAI’s own environment.

A practical rule:

Use GPT-5-mini on DuckDuckGo for quick, private, low-stakes work: “explain this term”, “rewrite this”, “give me ideas”, “summarise this short passage”, “what are the likely causes of this Windows issue?”

Use GPT-5.5 thinking mode for anything where the cost of being wrong matters: medical interpretation, legal or financial reasoning, political/current-event analysis, philosophical argument construction, coding, troubleshooting, research quality evaluation, investment comparisons, or nuanced ethical judgement.

The mini model is more likely to hallucinate, miss qualifications, over-simplify, or give a plausible but shallow answer. GPT-5.5 thinking mode is more likely to notice ambiguity, correct itself, ask whether assumptions are justified, and give a more robust answer. It can still be wrong, especially on current facts, obscure details, or anything requiring direct source checking, but its error rate should be lower on difficult tasks.

So the simple comparison is: DuckDuckGo GPT-5-mini is “fast, private, convenient, good enough”; GPT-5.5 thinking mode is “slower, deeper, more reliable, better for serious reasoning.”"

Alfredolin@sopuli.xyz · edit-2 22 hours ago

Hei. I know this is your post and all, but nobody asked for this huge AI copy/pasta.

edit: And according to this, it should have worked. My request was a basic factual question.

LoveRainbow@lemmy.world · 22 hours ago

Interesting, thanks.

It seems the latest and most sophisticated model produces the correct answer at this point.

I am using GPT5.5 on “Thinking” mode.