Microsoft's Chief Product Officer, Sarah Bird, revealed in an interview with The Verge that her team has designed a series of new security features specifically for Azure customers. These features allow customers to easily test their AI services without hiring a red team. Microsoft states that these tools, based on Large Language Models (LLM), can detect potential vulnerabilities, monitor seemingly reasonable but unsupported illusions, and block malicious prompts in any model on the Azure AI platform in real-time.
Bird explained, "We understand that not all customers have the expertise to understand prompt injection attacks or malicious content. Therefore, our evaluation system automatically generates prompts required to simulate these attacks, and customers can then obtain scores and view the results."
These innovative features help address controversies generated by unwanted or unexpected responses from generative AI, such as recent instances of celebrity deepfake images (e.g., those generated by Microsoft's Designer Image Generator), historically inaccurate visuals (e.g., Google Gemini's mishaps), or absurd search results (e.g., Mario flying into the Twin Towers on Bing).
Currently, Azure AI has launched three preview features:
- Prompt Shielding: Prevents malicious or injected prompts from external documents, ensuring that the model does not deviate from its training purpose.
- Stability Detection: Identifies and eliminates unstable factors that may cause illusions.
- Security Assessment: Comprehensive evaluation of model vulnerabilities to enhance overall security.
In addition, Microsoft plans to soon introduce two new features aimed at guiding secure content output from models and tracking prompts to identify potential problematic users.
In Azure AI Studio, content filtering settings provide robust support. These settings prevent prompt attacks and the appearance of inappropriate content, and determine how to handle issues. Whether it is user input prompts or third-party data being processed by the model, the monitoring system carefully evaluates them to ensure that no disabled vocabulary or hidden prompts are triggered before sending them to the model for response. Additionally, the system reviews the model's responses to ensure that it does not generate any information not mentioned in the documents or prompts.
Regarding the controversy surrounding Google Gemini images, the unintended consequences of bias filters have prompted the need for more fine-grained control, which is what Microsoft's Azure AI tools aim to provide. Bird acknowledges that concerns from the public about Microsoft and other companies deciding the appropriateness of AI models are valid. Therefore, her team has provided Azure customers with a way to switch off hate speech or violent content that the model sees and blocks.
In the future, Azure users will also be able to access user reports on attempts to trigger unsafe outputs. Bird states that this will help system administrators differentiate between the company's red team members and users who may have malicious intent.
It is worth mentioning that these security features are already applicable to GPT-4 and other popular models, such as Llama 2. However, users who use smaller or less commonly used open-source systems in Azure's model library may need to manually direct these security features to their specific models.
Microsoft has been actively leveraging AI technology to enhance the security of its software, especially in light of increasing customer interest in accessing AI models through Azure. Additionally, the company is committed to expanding the number of its powerful AI models and recently entered into an exclusive agreement with French AI company Mistral to provide the Mistral Large model on the Azure platform.