Skip to content
This repository has been archived by the owner on Mar 16, 2021. It is now read-only.
/ trefwurd Public archive

🌿 [WIP] Fast and easy lemmatization for any inflecting language

License

Notifications You must be signed in to change notification settings

lemontheme/trefwurd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

banner

Status: incomplete. Don't use yet.

You probably don't need a lemmatizer, but if you do, trefwurd's got you covered.

Trefwurd is..

  • fast (20k unique tokens/s)
  • lightweight (pure Python, zero dependencies)
  • low memory footprint
  • robust
  • overridable, with custom exception lists
  • easy to train

What's a lemmatizer?

Installing

Trefwurd is compatible with Python 3.6 and up, because type annotations and f-strings are beautiful.

$ pip install trefwurd

Download pretrained lemmatization models.

$ python3 -m trefwurd download {iso-lang-code}

Simple example

import trefwurd
lemmatizer = trefwurd.load("nl")
lemmatizer.lemmatize("honden", "NOUN")
lemmatizer.lemmatize([("honden", "NOUN"), ("eten", "VERB"), ("alles", "NOUN"))
lemmatizer.lemmatize(["honden", "eten", "alles"])

Documentation

TODO: make table.

Contributing

Tests

TODO: Um... Add tests.

About

🌿 [WIP] Fast and easy lemmatization for any inflecting language

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages