Otter@lemmy.ca to

Opensource@programming.devEnglish · 2 days ago

Mozilla Common Voice 22nd dataset is now available

commonvoice.mozilla.org

2

cross-posted to:
opensource@lemmy.ml

43

Mozilla Common Voice 22nd dataset is now available

commonvoice.mozilla.org

Otter@lemmy.ca to

Opensource@programming.devEnglish · 2 days ago

2

cross-posted to:
opensource@lemmy.ml

Mozilla Common Voice

commonvoice.mozilla.org

From their newsletter:

We’re so excited to share that the 22nd dataset release for Common Voice is now available for download.

Common Voice 22.0 has an additional 281 hours of speech data, bringing the total number of hours to 33,815. This release has also seen a jump in 296 newly validated hours, with a total of 22,640 validated hours of clips. This release welcomes the addition of Aromanian (rup), Tajik (tg), and Venda/Tshivenda (ve) languages.

Aromanian is spoken by around 210,000 people in the Balkans, while Tajik is a language closely related to Persian spoken in Tajikistan and Uzbekistan by over 10 million people. Venda / Tshivenda is spoken by over 2 million people as a first or other language in South Africa and Zimbabwe.

This brings the total number of languages available in this Scripted Speech release to 137.

For those unfamiliar:

Common Voice is a crowdsourcing project started by Mozilla to create a free and open speech corpus. The project is supported by volunteers who record sample sentences with a microphone and review recordings of other users. The transcribed sentences are collected in a voice database available under the public domain license CC0.[1] This license ensures that developers can use the database for voice-to-text and text-to-voice applications without restrictions or costs.

You must log in or # to comment.

Chat

Kissaki@programming.dev
link
fedilink
English
arrow-up
13·
2 days ago
44% Male/Masculine

39% No information

18% Female/Feminine

Tech bias even on public domain open contribution datasets. Apparently could use more female contributors.
- FundMECFS@quokk.au
  link
  fedilink
  English
  arrow-up
  5·
  1 day ago
  Yeah that’s pretty bad, wonder what kind of other biases there are as well. Class, dialect etc…
- FundMECFS@quokk.au
  link
  fedilink
  English
  arrow-up
  2·
  1 day ago
  deleted by creator

Opensource@programming.dev

opensource@programming.dev

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !opensource@programming.dev

A community for discussion about open source software! Ask questions, share knowledge, share news, or post interesting stuff related to it!

Credits

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

⠀

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

82 users / day
486 users / week
1.3K users / month
3.9K users / 6 months
8 local subscribers
3.41K subscribers
753 Posts
2.76K Comments
Modlog