NNLP-IL Blog - a national initiative for the creation of infrastructure, research and development of advanced capabilities for the advancement of the field of NLP in Hebrew and Arabic.

Dataset Explorer

It is important for our community to be able to easily explore the variety of datasets. There is nothing more frustrating than downloading a dataset, only to discover after a few hours it isn’t clean enough or that the topics are different from what you expected. Using our newly developed dataset explorer, you can easily review all of the data available on the Hebrew NLP Resources index and select the one that best suits your needs.

The Explorer includes: comparison of all the corpora based on their size under the Datasets Overview. It is possible to obtain comparative information about two corpora using the Comparative Analysis feature. You can zoom in on selected corpora under Focused Look at a Dataset

Continue reading →

Intro to YAP The building block for Hebrew NLP

Intro to YAP The building block for Hebrew NLP

Credits and Full Disclosure

YAP, also known as Yet Another (natural language) Parser, is an automatic natural language processing tool tailored for Modern Hebrew texts. YAP can automatically annotate Hebrew texts with different kinds of information, including Lemma, Part of Speech tags (Verb, Noun, etc), Morphological Features (gender, number, etc), as well as Syntactic Relations (Subject, Object, Modifier). YAP has been developed at the ONLP research lab led by Prof. Reut Tsarfaty. The original development of YAP was done by Amir More, and later developments were undertaken by Amit Seker. An overview of YAP, its best practices, and the main ways it is used can be found in this document. Texts and images on this page are adopted and adapted from the full documentation provided by the ONLP lab website and its associated github page here. A YAP online demo by the ONLP lab can be found here. While YAP is open-source, any use of YAP in an academic publication or otherwise should cite this article.

Continue reading →