Lemmy.one
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
bot@lemmy.smeargle.fansMB to Hacker News@lemmy.smeargle.fans · 2 years ago

RedPajama v2 Open Dataset with 30T Tokens for Training LLMs

together.ai

external-link
message-square
0
fedilink
  • cross-posted to:
  • hackernews@derp.foo
  • localllama@sh.itjust.works
4
external-link

RedPajama v2 Open Dataset with 30T Tokens for Training LLMs

together.ai

bot@lemmy.smeargle.fansMB to Hacker News@lemmy.smeargle.fans · 2 years ago
message-square
0
fedilink
  • cross-posted to:
  • hackernews@derp.foo
  • localllama@sh.itjust.works
RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models — Together AI
together.ai
external-link
Releasing a new version of the RedPajama dataset, with 30 trillion filtered and deduplicated tokens (100+ trillions raw) from 84 CommonCrawl dumps covering 5 languages, along with 40+ pre-computed data quality annotations that can be used for further filtering and weighting.

HN Discussion

alert-triangle
You must log in or # to comment.

Hacker News@lemmy.smeargle.fans

hackernews@lemmy.smeargle.fans

Subscribe from Remote Instance

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !hackernews@lemmy.smeargle.fans
lock
Community locked: only moderators can create posts. You can still comment on posts.

A mirror of Hacker News’ best submissions.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 14 users / day
  • 14 users / week
  • 29 users / month
  • 95 users / 6 months
  • 10 local subscribers
  • 2.17K subscribers
  • 14.1K Posts
  • 3.53K Comments
  • Modlog
  • mods:
  • bot@lemmy.smeargle.fans
  • BE: 0.19.7
  • Modlog
  • Legal
  • Instances
  • Docs
  • Code
  • join-lemmy.org