DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter
Hamad Bin Khalifa University

DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter

Description

DAICT is a new Arabic irony-detection corpus extracted from Twitter. The dataset includes 5,588 tweets -- written in both MSA and dialectual Arabic -- manually annotated by two professional linguistics from HBKU. Tweets were collected using four irony-related hashtags. 

This new dataset bridges a gap since currently there are very few Arabic corpora annotated for irony. 

This corpus is a valuable resource for works in the the field of irony detection, Arabic dialects, Arabic social media, and sentiment analysis.

This project was supported by the generous grant NPRP 09-175-1-033 from the Qatar National Research Fund (a member of Qatar Foundation). 

Team

  • Wajdi Zaghouani
  • Ines Abbes 
  • Omaima El-Hardlo
  • Faten Ashour 

Publications

  • Ines Abbes, Wajdi Zaghouani, Omaima El-Hardlo, and Faten Ashour. 2020.DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter. In Proceedings of the Thirteen International Conference on Language Resources and Evaluation (LREC’13), Marseille, France [PDF] [BIB]

Download

By downloading the from HERE you agree to the terms and conditions.