Meta Ditched Open Source and Built Its Best AI Model Yet

In partnership with

Hey there! 👋

Welcome back to SavvyMonk, your one-stop for AI and tech news that actually matters.

Meta just dropped a new AI model called Muse Spark, and it's the company's most capable one to date. It's also the first time Meta has shipped a closed-source model, which is a bigger deal than it sounds.

Let's get into it.

100 Genius Side Hustle Ideas

Don't wait. Sign up for The Hustle to unlock our side hustle database. Unlike generic "start a blog" advice, we've curated 100 actual business ideas with real earning potential, startup costs, and time requirements. Join 1.5M professionals getting smarter about business daily and launch your next money-making venture.

Get the guide

TODAY'S DEEP DIVE

Meta's New AI Lab Just Shipped Its First Model, and It's Not Llama

About nine months ago, Meta made a bold bet. The company struck a $14.3 billion deal to bring in Alexandr Wang, the founder and then-CEO of Scale AI, as its new Chief AI Officer. Wang's mandate was simple and enormous: fix Meta's AI efforts from the ground up. The company needed it.

Llama 4, Meta's previous flagship model family, launched in April 2025 and was quickly picked apart. Multiple outlets confirmed that Meta had manipulated benchmark results using specialized sub-models that were never actually released to the public.

Around the same time, Yann LeCun, Meta's longtime Chief AI Scientist and its most visible open-source advocate, left the company in November 2025 after his role was reduced.

Wang took over and set up Meta Superintelligence Labs (MSL), a new internal division given a blank slate. No salvaging the old stack. New infrastructure, new model architecture, new data pipelines, built in nine months.

On April 8, 2026, MSL shipped its first product: Muse Spark.

How It Works

Muse Spark is a natively multimodal reasoning model, meaning it handles text and visual input together from the start rather than bolting vision on as an afterthought. The model has three interaction modes, and the difference between them matters.

The first is Instant, for fast, casual queries with no extended reasoning. The second is Thinking, a standard chain-of-thought mode for harder problems. The third is Contemplating, which is the technically interesting one. Instead of having a single AI agent think for longer (which increases latency linearly), Contemplating mode runs multiple agents reasoning in parallel. The result is stronger performance at comparable response times.

On the training side, Meta says it rebuilt its pretraining stack with improvements to model architecture, optimization, and data curation. One claimed result is that Muse Spark can reach the same capability level as Llama 4 Maverick using over ten times less compute.

For health reasoning specifically, Meta worked with over 1,000 physicians to curate training data so the model gives more factual and comprehensive responses.

The Numbers

Independent benchmarking from Artificial Analysis gives Muse Spark a score of 52 on the Intelligence Index, placing it fourth overall behind GPT-5.4, Gemini 3.1 Pro, and Claude Opus 4.6.

For context, Llama 4 Maverick scored 18 at launch. That's a meaningful jump in one release.

Where Muse Spark genuinely leads the field is health. On HealthBench Hard, it scores 42.8% versus GPT-5.4's 40.1%. If your use case touches clinical reasoning or medical information, it's currently the strongest model available on paper.

The gaps are real though. On ARC-AGI-2, which tests abstract visual reasoning, Muse Spark scores 42.5 while GPT-5.4 and Gemini 3.1 Pro both sit around 76. That's not a small difference. Agentic coding tasks tell a similar story: Terminal-Bench 2.0 has Muse Spark at 59.0 versus GPT-5.4 at 75.1. François Chollet, co-creator of the ARC-AGI benchmark, called the model "overoptimized for public benchmark numbers at the detriment of everything else." Wang acknowledged the abstract reasoning gap publicly and said the team is working on it.

On token efficiency, Muse Spark is notably lean. It used 58 million output tokens to complete the full Intelligence Index evaluation, comparable to Gemini 3.1 Pro and significantly less than Claude Opus 4.6 at 157 million. At Meta's scale, that matters a lot for serving costs.

The Open Source Shift Nobody Wanted to Mention

This is the detail that keeps getting buried in coverage of the benchmarks. Muse Spark is a closed-source, proprietary model. Meta has not released the weights and has only said it "hopes to open-source future versions." That is not a commitment, and there is no timeline.

This is a clean break from everything Meta built its AI reputation on. The Llama ecosystem became a standard for researchers, startups, and developers who needed to self-host or fine-tune a capable model. Muse Spark does not serve that community at all.

The model is free inside the Meta AI app and on meta.ai, and it will roll out to WhatsApp, Instagram, Facebook, Messenger, and the Ray-Ban Meta AI glasses over the coming weeks. But there is no public API yet, and no pricing. Developers cannot build with it today.

Meta's Llama 4 models are still available and unchanged. The dual-track reading where Llama stays open and Muse is closed is the most likely scenario going forward, but Meta has not said this explicitly.

The Bottom Line

Muse Spark is a real model. The jump from Llama 4 in nine months is not nothing, and the health benchmarks are genuinely impressive. But the framing of it as "competitive with leading models" needs some fine print.

It's competitive on reasoning and some multimodal tasks, behind the pack on agentic work and abstract reasoning, and it's not accessible to developers yet. The more consequential story here isn't the benchmark numbers.

It's that Meta just walked away from open source for its flagship model line, and that changes how the whole ecosystem treats them.

AI PROMPT OF THE DAY

Category: Competitive Research

"I'm evaluating AI models for [use case, e.g. customer support / health information / code generation]. Compare the top three models available today across these criteria: benchmark performance on tasks relevant to my use case, pricing and API availability, reasoning capabilities, and any known weaknesses. Recommend which one to start with and why."

ONE LAST THING

Meta's open-source era in AI may be over, at least at the frontier level. For years, Llama was what made Meta interesting to developers. Muse Spark is interesting for different reasons, but it doesn't fill that same role. The question worth watching isn't whether Muse Spark can catch GPT-5.4. It's whether Meta can hold two audiences at once: the billions of consumer users it reaches through its apps, and the developer community it may have just handed to its competitors.

Hit reply, I read every response.

See you in the next one.

— Vivek

P.S. Know someone who follows AI and tech closely? Forward this their way. They can subscribe at https://savvymonk.beehiiv.com/