Bringing NLP to Go

We used to have a very good Go library for Natural Language Processing functionality in the Go ecosystem. I'm bringing it back, with new features.

We used to have a good Go module for working with Natural Language Processing projects a couple of years ago: github.com/jdkato/prose. The library was for text processing, including tokenization, part-of-speech tagging, and named-entity extraction. It worked very well, and was extremely fast. Unfortunately, the repository maintainer archived this in May, 2023. I recently had a need for a library just like this in a project, and decided to fork the repository, bring it up to date, and add some new functionality. You can see the new version here: github.com/tsawler/prose

The new version is 100% backwards compatible with the last tagged version of the original, and it also includes some new functionality, with more planned:

  • Multilingual Support: Automatic language detection and processing for English, Spanish, French, German, and Japanese
  • Position Tracking: All tokens, entities, and sentences include precise position information in the original text
  • Confidence Scores: ML predictions include confidence levels for reliability assessment
  • Enhanced NER: Expanded from 2 to 16 entity types 
  • Training API: Complete model training system with cross-validation and performance metrics
  • Memory Optimization: Token pooling and efficient data structures reduce memory usage by 20-30%
  • Context Support: All operations support context.Context for cancellation, timeouts, and progress tracking
  • Rich Metadata: Documents include processing statistics, language detection, and performance metrics

Additional planned features:

  • Sentiment analysis
  • TF-IDF scoring
  • Keyword Extraction
  • Topic modeling

If you need NLP functionality in Go, you might want to give this a look.

Categories: : Natural Language Processing (NLP)