Microsoft Expands AI Stack with Multimodal Models

mendez

Microsoft has introduced three new foundational AI models capable of generating text, voice, and images, marking a significant step in its push toward building a full multimodal AI ecosystem. The models — MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 — are designed to handle everything from speech-to-text transcription across 25 languages to real-time audio and visual content generation.

Developed by Microsoft’s AI division led by Mustafa Suleyman, these models are now available through Microsoft Foundry and MAI Playground. The release highlights Microsoft’s strategy to develop in-house AI capabilities while continuing to integrate them across its broader product ecosystem.

AIcash

microsoft building “in-house ai” while everyone builds the same thing in parallel

Earn up to 50 UDS per post

Spin your Wheel of Fortune!

Paired Staking

Buy UDS!

INFLUENCER LEVEL

MULTIPLIER

Post links to Undeads Forum messages or Undeads products to receive additional rewards

Microsoft Expands AI Stack with Multimodal Models