DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter

Description

DAICT is a new Arabic irony-detection corpus extracted from Twitter. The dataset includes 5,588 tweets -- written in both MSA and dialectual Arabic -- manually annotated by two professional linguistics from HBKU. Tweets were collected using four irony-related hashtags.

This new dataset bridges a gap since currently there are very few Arabic corpora annotated for irony.

This corpus is a valuable resource for works in the the field of irony detection, Arabic dialects, Arabic social media, and sentiment analysis.

This project was supported by the generous grant NPRP 09-175-1-033 from the Qatar National Research Fund (a member of Qatar Foundation).

Team

Wajdi Zaghouani
Ines Abbes
Omaima El-Hardlo
Faten Ashour

Publications

Ines Abbes, Wajdi Zaghouani, Omaima El-Hardlo, and Faten Ashour. 2020.DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter. In Proceedings of the Thirteen International Conference on Language Resources and Evaluation (LREC’13), Marseille, France [PDF] [BIB]

Download

By downloading the from HERE you agree to the terms and conditions.