🐢 Open-Source Evaluation & Testing for LLMs and ML models
-
Updated
Jun 13, 2024 - Python
🐢 Open-Source Evaluation & Testing for LLMs and ML models
A curated list of awesome responsible machine learning resources.
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
[NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Aligning AI With Shared Human Values (ICLR 2021)
RuLES: a benchmark for evaluating rule-following in language models
How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚
Code accompanying the paper Pretraining Language Models with Human Preferences
📚 A curated list of papers & technical articles on AI Quality & Safety
An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility.
A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
Attack to induce LLMs within hallucinations
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Feature Space Singularity for Out-of-Distribution Detection. (SafeAI 2021)
Reading list for adversarial perspective and robustness in deep reinforcement learning.
A project to add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference time) and detection without classification accuracy drop, hyperparameter tuning, or collecting additional data.
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."