This line of research advances the development of language models that proactively reduce unintended harm by minimizing hallucinations, addressing systemic biases, and aligning outputs with broadly shared societal values. To safeguard against intentional misuse and exploitation, we focus on strengthening models’ robustness against adversarial attacks and establishing external guardrails to continuously monitor, assess, and guide model behavior. This work centers on the Fanar family of models developed by QCRI, supported by a collaborative effort across the CS, QCAI, and ALT groups, combining expertise from several areas.
Safety and Security of LLMs
Description, Goals, and Focus