Semantic Web Company
Menu
Open
Close
Menu
  • Home
  • graphwise
  • Solutions
    • backSolutions
    • Search & Analytics
    • Recommender Systems
    • Digital Transformation
  • Products
    • backProducts
    • PoolParty Semantic Suite
    • PoolParty for Sharepoint
  • Company
    • backCompany
    • About us
    • Leadership Team
    • Partners
  • Research
    • backResearch
    • Home
    • Topics
    • Projects
    • Publications
    • Events
  • Careers
  • Learn more
    • backLearn more
    • SEMANTiCS Conference
    • Company News
  • Legal
    • backLegal
    • Imprint
    • Privacy
    • Terms of use
  • Contact us

Learn more

  • Sep 12, 2012

State-of-the-art Text Mining: PoolParty Extractor 2.1.1 released

  • Text Mining, Tools & Software, Uncategorized

PoolParty Extractor (PPX) is part of the PoolParty product family and builds the basis for state-of-the art text mining applications.

The idea behind PPX is to underpin automatic text mining algorithms with domain-specific knowledge from thesauri and linked data sources. This is the precondition to extract meaning from unstructured information more precisely and with higher performance. PoolParty Extractor supports the following application scenarios:

  • automatic document categorisation
  • named entity extraction based on concepts from thesauri or other knowledge models
  • text analysis to improve semantic indexing
  • automatic transformation of unstructured text to an RDF based linked data source
  • linking and enrichment of text with structured data from databases or XML-documents
  • extended indexing by using inflected forms of words and by splitting of compound words
  • generation and continuous improvement of thesauri by text corpus analysis

PoolParty Extractor can be integrated smoothly with third-party systems like CMS, DMS, communication platforms, wikis etc. PPX is fully based on Java and provides an HTTP API. Integrations with Sharepoint, Confluence, WordPress and others exist, please provide us your use case!

The latest release 2.1.1 of PPX further extends the capabilities to extract meaning from text with high precision and high performance:

  • use of tf-idf (term frequency inverse document frequency)
    • Creation of a textcorpus for tf-idf
    • Use tf-idf calculation during extraction
    • Corpus / thesaurus alignment
      • show missing concepts
      • show not used concepts
  • Use regular expressions to match specific patterns in texts
  • Use parts of the thesaurus as dynamic components for regular expressions
  • Calculate inflected forms (at the moment for German)
    • Word forms are added to the extraction model and used during extraction
    • List of inflected forms can be imported to thesaurus
  • Split compound words (at the moment for German)

PPX can be tested online as a web service, please send us a short note describing your interest and we will provide further details.

PrevPrevious post
Next postNext
ALL POSTS

Follow us on

LinkedIn
Twitter
Youtube
  • X (Twitter)
  • Linkedin
  • Youtube
  • Xing
  • Instagram
Scroll Top

2025 © Semantic Web Company