I was an Apple AIML Resident working on LLMs and morphologically complex languages. My work tackled how LLMs perform on morphologically rich languages.
Prior to that, I spent one year as an Applied Researcher at AMEX GBT.
In my free time, I aim to spend time away from my computer. You can find me playing squash, doing drum covers of songs I like, or discovering something new in Paris[TBD].
Large Language Models (LLMs) have shown significant progress on various multilingual benchmarks and are increasingly used to generate and evaluate text in non-English languages. However, while they may produce fluent outputs, it remains unclear to what extent these models truly grasp the underlying linguistic complexity of those languages, particularly in morphology. To investigate this, we introduce IMPACT, a synthetically generated evaluation framework focused on inflectional morphology, which we publicly release, designed to evaluate LLM performance across five morphologically rich languages: Arabic, Russian, Finnish, Turkish, and Hebrew. IMPACT includes unit-test-style cases covering both shared and language-specific phenomena, from basic verb inflections (e.g., tense, number, gender) to unique features like Arabicâs reverse gender agreement and vowel harmony in Finnish and Turkish. We assess eight multilingual LLMs that, despite strong English performance, struggle with other languages and uncommon morphological patterns, especially when judging ungrammatical examples. We also show that Chain of Thought and Thinking Models can degrade performance. Our work exposes gaps in LLMsâ handling of linguistic complexity, pointing to clear room for improvement. To support further research, we publicly release the IMPACT framework.
@misc{saeed2025impactinflectionalmorphologyprobes,title={IMPACT: Inflectional Morphology Probes Across Complex Typologies},author={Saeed, Mohammed and Vehvilainen, Tommi and Fedoseev, Evgeny and Caliskan, Sevil and Vodolazova, Tatiana},year={2025},eprint={2506.23929},archiveprefix={arXiv},primaryclass={cs.CL},url={https://arxiv.org/abs/2506.23929},}
EMNLP
You Are My Type! Type Embeddings for Pre-trained Language Models
Mohammed Saeed, and Paolo Papotti
In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Dec 2022
One reason for the positive impact of Pre-trained Language Models (PLMs) in NLP tasks is their ability to encode semantic types, such as âEuropean Cityâ or âWomanâ. While previous work has analyzed such information in the context of interpretability, it is not clear how to use types to steer the PLM output. For example, in a cloze statement, it is desirable to steer the model to generate a token that satisfies a user-specified type, e.g., predict a "date" rather than a "location". In this work, we introduce Type Embeddings (TEs), an input embedding that promotes desired types in a PLM. Our proposal is to define a type by a small set of word examples. We empirically study the ability of TEs both in representing types and in steering masking predictions without changes to the prompt text in BERT. Finally, using the LAMA datasets, we show how TEs highly improve the precision in extracting facts from PLMs.
@inproceedings{saeed-etal-2022-TE,title={You Are My Type! Type Embeddings for Pre-trained Language Models},author={Saeed, Mohammed and Papotti, Paolo},booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing},month=dec,year={2022},address={Online and Abu Dhabi, UAE},publisher={Association for Computational Linguistics},}
EMNLP
RuleBERT: Teaching Soft Rules to Pre-Trained Language Models
Mohammed Saeed, Naser Ahmadi, Preslav Nakov, and 1 more author
In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov 2021
While pre-trained language models (PLMs) are the go-to solution to tackle many natural language processing problems, they are still very limited in their ability to capture and to use common-sense knowledge. In fact, even if information is available in the form of approximate (soft) logical rules, it is not clear how to transfer it to a PLM in order to improve its performance for deductive reasoning tasks. Here, we aim to bridge this gap by teaching PLMs how to reason with soft Horn rules. We introduce a classification task where, given facts and soft rules, the PLM should return a prediction with a probability for a given hypothesis. We release the first dataset for this task, and we propose a revised loss function that enables the PLM to learn how to predict precise probabilities for the task. Our evaluation results show that the resulting fine-tuned models achieve very high performance, even on logical rules that were unseen at training. Moreover, we demonstrate that logical notions expressed by the rules are transferred to the fine-tuned model, yielding state-of-the-art results on external datasets.
@inproceedings{saeed-etal-2021-rulebert,title={{R}ule{BERT}: Teaching Soft Rules to Pre-Trained Language Models},author={Saeed, Mohammed and Ahmadi, Naser and Nakov, Preslav and Papotti, Paolo},booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},month=nov,year={2021},address={Online and Punta Cana, Dominican Republic},publisher={Association for Computational Linguistics},doi={10.18653/v1/2021.emnlp-main.110},pages={1460--1476},}
VLDB
Scrutinizer: A Mixed-Initiative Approach to Large-Scale, Data-Driven Claim Verification
Georgios Karagiannis*, Mohammed Saeed*, Paolo Papotti, and 1 more author
Organizations spend significant amounts of time and money to manually fact check text documents summarizing data. The goal of the Scrutinizer system is to reduce verification overheads by supporting human fact checkers in translating text claims into SQL queries on an database. Scrutinizer coordinates teams of human fact checkers. It reduces verification time by proposing queries or query fragments to the users. Those proposals are based on claim text classifiers, that gradually improve during the verification of a large document. In addition, Scrutinizer uses tentative execution of query candidates to narrow down the set of alternatives. The verification process is controlled by a cost-based optimizer. It optimizes the interaction with users and prioritizes claim verifications. For the latter, it considers expected verification overheads as well as the expected claim utility as training samples for the classifiers. We evaluate the Scrutinizer system using simulations and a user study with professional fact checkers, based on actual claims and data. Our experiments consistently demonstrate significant savings in verification time, without reducing result accuracy.
@article{10.14778/3407790.3407841,author={Karagiannis, Georgios and Saeed, Mohammed and Papotti, Paolo and Trummer, Immanuel},title={Scrutinizer: A Mixed-Initiative Approach to Large-Scale, Data-Driven Claim Verification},year={2020},issue_date={August 2020},publisher={VLDB Endowment},volume={13},number={12},issn={2150-8097},url={https://doi.org/10.14778/3407790.3407841},doi={10.14778/3407790.3407841},journal={Proc. VLDB Endow.},month=jul,pages={2508â2521},numpages={14},}
TTO
Explainable Fact Checking with Probabilistic Answer Set Programming
Naser Ahmadi, Joohyung Lee, Paolo Papotti, and 1 more author
One challenge in fact checking is the ability to improve the transparency of the decision. We present a fact checking method that uses reference information in knowledge graphs (KGs) to assess claims and explain its decisions. KGs contain a formal representation of knowledge with semantic descriptions of entities and their relationships. We exploit such rich semantics to produce interpretable explanations for the fact checking output. As information in a KG is inevitably incomplete, we rely on logical rule discovery and on Web text mining to gather the evidence to assess a given claim. Uncertain rules and facts are turned into logical programs and the checking task is modeled as an inference problem in a probabilistic extension of answer set programs. Experiments show that the probabilistic inference enables the efficient labeling of claims with interpretable explanations, and the quality of the results is higher than state of the art baselines.
@article{Ahmadi2019ExplainableFC,title={Explainable Fact Checking with Probabilistic Answer Set Programming},author={Ahmadi, Naser and Lee, Joohyung and Papotti, Paolo and Saeed, Mohammed},journal={Truth and Trust Online},year={2019},volume={abs/1906.09198},}
I am always up for a discussion about my work or anything related. Feel free to shoot me an e-mail!