Integrating Semantic Knowledge to Tackle Zero-shot Text Classification

Published in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019

Jingqing Zhang and Piyawat Lertvittayakumjorn contributed equally to this project.

Paper link: arXiv:1903.12626 or NAACL19

Code and more: Github

Abstract

Insufficient or even unavailable training data of emerging classes is a big challenge of many classification tasks, including text classification. Recognising text documents of classes that have never been seen in the learning stage, so-called zero-shot text classification, is therefore difficult and only limited previous works tackled this problem. In this paper, we propose a two-phase framework together with data augmentation and feature augmentation to solve this problem. Four kinds of semantic knowledge (word embeddings, class descriptions, class hierarchy, and a general knowledge graph) are incorporated into the proposed framework to deal with instances of unseen classes effectively. Experimental results show that each and the combination of the two phases clearly outperform baseline and recent approaches in classifying real-world texts under the zero-shot scenario.

Citation

@inproceedings{zhangkumjornZeroShot,
    title = "Integrating Semantic Knowledge to Tackle Zero-shot Text Classification",
    author = "Zhang, Jingqing and
    Lertvittayakumjorn, Piyawat and 
    Guo, Yike",
    booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Long Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, USA",
    publisher = "Association for Computational Linguistics",
    pages = {1031--1040},
}