• V0ldek@awful.systems
    link
    fedilink
    English
    arrow-up
    9
    ·
    1 年前

    I was thinking about this after reading the P(Dumb) post.

    All normal ML applications have a notion of evalutaion, e.g. the 2x2 table of {false,true}x{positive,negative}, or for clustering algorithms some metric of “goodness of fit”. If you have that you can make an experiment that has quantifiable results, and then you can do actual science.

    I don’t even know what the equivalent for LLMs is. I don’t really have time to spare to dig through the papers, but like, how do they do this? What’s their experimental evaluation? I don’t seen an easy way to classify LLM outputs into anything really.

    The only way to do science is hypothesis->experiment->analysis. So how the fuck do the LLM people do this?

    • o7___o7@awful.systems
      link
      fedilink
      English
      arrow-up
      8
      ·
      edit-2
      1 年前

      Right? “AI” is great if you want to sort a few million images of galaxies into their various morphological classifications and have it done before the end of the decade. A++, good job, no notes.

      You can’t grift off of that very easily, though.

    • self@awful.systems
      link
      fedilink
      English
      arrow-up
      7
      ·
      1 年前

      I’d really like to know too, especially given how many times we’ve already seen LLMs misused in scientific settings. it’s starting to feel like the LLM people don’t have that notion — but that’s crazy, right?