[CFP] The 4th Workshop on Arabic Corpus Linguistics (WACL-4)

WACL4 at COLING’2025 with focus on Arabic Dialects

The workshop will be held online on January 20th, 2025 in conjunction with the 31st edition of COLING in 2025 in Abu Dhabi (UAE).

The field of Arabic language research using corpora and corpus methods has experienced significant growth and development in recent years. What once were isolated efforts have now transformed into a vibrant and expansive area of study, advancing rapidly across multiple dimensions in both corpus and computational linguistics. Building upon the success of previous editions—WACL-1 in 2011, WACL-2 in 2013 in conjunction with the Corpus Linguistics Conference at Lancaster University, and WACL-3 in 2019 at the Corpus Linguistics 2019 conference at Cardiff University—we are excited to announce the fourth edition of the Workshop on Arabic Corpus Linguistics (WACL-4).

The primary objectives of WACL-4 are to highlight the latest developments in the creation, annotation, and application of Arabic corpora, including the introduction of new corpora and advancements in annotation techniques, while fostering collaboration among researchers from diverse institutions and regions to stimulate joint research projects and interdisciplinary initiatives. This edition will place a special emphasis on the study of Arabic dialects, including non-standard and regional varieties, to broaden the understanding of Arabic in its various manifestations and support research on under-resourced linguistic varieties. Additionally, WACL-4 aims to encourage the development and refinement of Natural Language Processing (NLP) systems and tools tailored for Arabic, integrating corpora into NLP workflows, creating new computational tools, and evaluating existing systems to improve their efficacy in processing Arabic text.

There are 22 Arab-speaking countries in the Arab League, including Algeria, Bahrain, Comoros, Djibouti, Egypt, Iraq, Jordan, Kuwait, Lebanon, Libya, Mauritania, Morocco, Oman, Palestine, Qatar, Saudi Arabia, Somalia, Sudan, Syria, Tunisia, UAE, and Yemen. Each of these countries has its own specific Arabic dialects. Additionally, Arabic is spoken in other countries outside the Arab League, contributing to a global total of over 400 million Arabic speakers. This significant number highlights the urgent need for dedicated research efforts to study and document these diverse dialects thoroughly.

Arabic is a rich and diverse language, characterised by a wide collection of dialects. Despite the significant efforts made in developing tools and corpora for Arabic MSA, many Arabic dialects remain under-studied, primarily due to limited resources such as research funding and available datasets. This lack of comprehensive study leaves significant gaps in our understanding and documentation of these dialects. WACL-4 aims to address this issue by providing a platform for scholars to share resources, methodologies, and findings, thereby advancing the study of Arabic in its various forms.

The fourth edition of the Workshop on Arabic Corpus Linguistics (WACL-4) is motivated by this pressing need to bridge the research gap. By focusing on Arabic dialects, WACL-4 aims to provide a platform for scholars to share resources, methodologies, and findings, thereby advancing the study of Arabic in its various forms. With the increasing importance of language models in the field of computational linguistics, WACL-4 will also highlight their role in analysing and understanding Arabic dialects. These models are crucial for processing and generating natural language, offering new insights and tools for researchers. This workshop will play a crucial role in fostering collaboration and innovation, ultimately contributing to a more comprehensive understanding of the Arabic language and its dialectal richness.

  • Development and Utilisation of Arabic Dialectal Corpora
  • Advancements in Natural Language Processing Techniques for Arabic Dialects
  • Applications and Challenges of Large Language Models in Understanding and Generating Arabic Dialects
  • Morphological and Syntactical Challenges in Arabic Dialects
  • Dialect Identification and Classification
  • Speech Recognition and Synthesis for Arabic Dialects
  • Machine Translation involving Arabic Dialects
  • Sentiment Analysis and Opinion Mining in Arabic Dialects
  • Named Entity Recognition and Information Extraction for Arabic Dialects
  • Development of Open Access Resources for Arabic Dialects
  • Text Processing and Transliteration Challenges for Arabic Dialects
  • Cultural and Sociolinguistic Considerations in NLP Applications for Arabic Dialects
  • Resources and Tools for Computational Analysis of Arabic Dialects
  • Applications of Arabic Dialects NLP in Real-World Scenarios

Important dates:

  • 1st Call for Papers Announcement: 25 July 2024
  • 2nd Call for Papers Announcement: 31 August 2024
  • Paper Submission Deadline: 10 October 2024
  • Notification of Paper Acceptance: 1 November 2024
  • Camera-ready Paper Deadline: 15 November 2024
  • Workshop Date: 20th January 2025

For more details, please visit: https://wp.lancs.ac.uk/wacl4/