TL;DR: AI will be able to generate more data after it has been trained that is comparable to the quality of the training data, thereby rendering any training data absolutely worthless. The time to sell data at a reasonable price is now, and those locking their data behind huge financial barriers (such as Twitter and Reddit) are stupidly HODLing a rapidly deprecating asset.

  • RA2lover@burggit.moe
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    you don’t need to use all the output for training if you can separate the good parts. “OpenAI” reportedly used paid for (and is now using free) RLHF for this, Anthropic is trying to develop RLAIF to achieve the same.