We believe that powerful artificial intelligence must be safe. We invest significant resources in safety research, risk assessment, and the development of safeguards to ensure AI benefits all humanity.
We actively identify and mitigate potential risks, including misuse, unintended consequences, and systemic risks. Before releasing new models, we conduct rigorous safety assessments.
We introduce human feedback (RLHF) during training to ensure model behavior aligns with human intentions and values, reducing bias and harmful outputs.
We continuously monitor system usage after deployment, quickly respond to emerging threats, and continuously update models to improve security.
We embed safety practices at every stage of the AI development cycle.
We invite external experts ("red teams") to attack our models, trying to induce them to produce harmful content, bias, or misinformation. This helps us discover and fix vulnerabilities before release. For example, before GPT-4 release, we invited over 50 experts from AI safety, cybersecurity, biological risk and other fields to conduct testing.
We use reinforcement learning technology based on human feedback to fine-tune models. Human trainers rank and score model responses, guiding the model to generate more useful, truthful, and harmless replies. This is key to ChatGPT's ability to have fluent conversations while maintaining safety.
We developed a free content moderation API to help developers identify and filter inappropriate content such as hate speech, self-harm, violence, etc. This is also integrated into ChatGPT to prevent generating content that violates our usage policies.
We collaborate with governments, civil society, and other AI labs to develop AI safety standards and best practices. We support regulation of high-risk AI systems and are committed to improving transparency in AI development.
We understand the importance of data privacy. When you use ChatGPT, you have control over your data.