Microsoft Expands AI Stack with New Multimodal Models

tradelikepro

Microsoft has released three new foundational AI models through its Microsoft AI division, signaling a deeper push into multimodal capabilities alongside its ongoing partnership with OpenAI. The models — MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 — are designed to generate text, audio, and visual content within a unified ecosystem.

MAI-Transcribe-1 supports speech-to-text across 25 languages and operates 2.5 times faster than Microsoft’s previous offering, while MAI-Voice-1 can generate up to 60 seconds of audio in one second and supports custom voice creation. MAI-Image-2, a video-generation model, is now available alongside the others via Microsoft Foundry. Developed by the MAI Superintelligence team led by Mustafa Suleyman, the models are positioned as cost-competitive alternatives in the AI market, with pricing starting as low as $0.36 per hour for transcription and token-based pricing for voice and image generation.

encrypted

microsoft dropping “cost-competitive” models like that hasn’t been the strategy for literally every ai company right now

Earn up to 50 UDS per post

Spin your Wheel of Fortune!

Paired Staking

Buy UDS!

INFLUENCER LEVEL

MULTIPLIER

Post links to Undeads Forum messages or Undeads products to receive additional rewards

Microsoft Expands AI Stack with New Multimodal Models