A group of Israeli researchers from BGN Technologies, the technology company of the Ben-Gurion University, has invented a new automated and language-independent tool for summarizing text.
The new method can automatically extract key sentences from texts in various languages, and form a summary of the text based on its most important elements, giving the reader only the essential aspects of the text.
When given a text to summarize, humans often rephrase the main points in their own words, a method called abstractive summarization. Such summarizing requires high-level human skills. Currently, the abstractive summarization software is not yet capable of producing results comparable to that of a human, so many methods use extractive summarization instead.
The new technology, called Multilingual Sentence Extractor (MUSE), provides language-independent summaries of texts, based on an algorithm that ranks the importance of sentences, using statistical sentence features, which can be calculated for sentences in any language, and then extracts top–ranking sentences into a summary.
The method was tested in nine languages, including English, Hebrew, Arabic, Persian, Russian, Chinese, German, French, and Spanish. Its summarization quality, which was evaluated in English, Hebrew, Arabic and Persian, showed a high level of similarity to human-generated summaries.
Most available solutions are language dependent and require training algorithms on large volumes of text in a specific language.
Experimental results show that after initial training of the algorithms on a corpus of summarized documents, there is no need for further training in each new language, and the same sentence-ranking model can be used across several languages.
The new method was invented by Prof. Mark Last, Dr. Marina Litvak, and Dr. Menahem Friedman at the Department of Software and Information Systems Engineering of Ben-Gurion University.
“[Extractive summarization] is invaluable for being able to quickly summarize large quantities of text in a language-independent manner. This ability is crucial for search engines as well as other end-users, such as researchers, libraries and the media,” he said.
Zafrir Levy, Senior VP of Business Development at BGN Technologies, said that a patent on the technology had been filed, and further partnerships are being explored.
“This tool will be a valuable addition to our ability to benefit from the vast amounts of text available online,” he said.