I've Been Sleeping On Chinese Models

I came across this Hacker News post today:

Kimi K2 1T model runs on 2 512GB M3 Ultras

While I can’t wait for the day that I can self-host my own LLM server, that’s not what caught my eye. In the comments, there was a lot of discussion around what a great writer Kimi K2 is. This immediately caught my attention, because I have been paying way more than I’d like to have access to reasonable Opus limits in Claude Code. I’ve found from all of the models that the top three US labs, Opus was consistently the best writer for my use cases. Specifically, I just finished my master’s thesis, and relied heavily on the model to help me craft the document, hence the heavy Max Plan that I had to shoulder to get me through.

Had I known Kimi K2 was around, I could have saved a ton of money, and maybe had a better document in the end. Live and learn, I suppose, but I’m glad to have learned about it now.

Some of the commenters also mentioned that Kimi K2 was much more blunt in its criticism, and will let you know when your idea is actually bad. This really excited me, because I’m constantly worried that the models are just reinforcing the ideas that I’m bringing to the conversation. I sometimes feel like I’m just talking to myself, and creating my own echo chamber in these conversations, and ironically, have come to appreciate honest feedback from humans even more since working with LLMs.

I tested Kimi K2 on a number of recent pieces that I’ve written, including my thesis and last blog post. I was disappointed to find it heaping on praise again, even when trying to prompt it to be critical. To be fair, I am proud of both pieces of work, and showed a completed project, so it makes sense that feedback would be limited and minor.

I searched around the internet for examples of bad writing, and it was more difficult than I thought it would be. I eventually just asked Kimi K2 to generate an example from scratch. The first attempt got stuck in a loop adding “very” to a sentence over and over again before I exited it out. I asked it to try again and it came up with a sales pitch that uses block chain to sell cat food. Perfect.

Kimi K2’s Bad Writing

We are revolutionizing the paradigm of synergistic blockchain-enabled AI-driven cat food delivery through our proprietary IoT-enabled litter box sensors. Our unique value proposition leverages machine learning algorithms to predict feline hunger patterns with 97.3% accuracy (pending verification).

Our go-to-market strategy involves disrupting the $47 billion pet industry through gamified NFT treats that create viral engagement loops. Users earn “MeowCoins” by sharing their cat’s eating habits on social media, which can be redeemed for premium salmon-flavored tokens.

The market is huge because everyone has cats and cats need food. Our competitive advantage is that we’re first movers in the intersection of cryptocurrency and cat cuisine. We’re seeking $2.5M seed funding at a $50M pre-money valuation to scale our MVP (currently a Google form where people upload cat photos).

Our team consists of me, my roommate who once worked at Petco, and our advisor who owns three cats.

K2’s Critique

I fed that text to a new session of Kimi K2 and got this response:

This pitch is a masterclass in startup buzzword bingo, but it’s completely detached from reality. Here’s the brutal truth:

Core Problems:

  • 97.3% accuracy claim is meaningless without data, sample size, or methodology
  • “Blockchain-enabled cat food delivery” solves a problem that doesn’t exist
  • NFT treats and MeowCoins are gimmicks that add friction to a simple transaction
  • Google form MVP proves you haven’t validated anything

Market Reality:

  • Cat owners want convenient, reliable food delivery - not crypto gamification
  • The $47B pet industry isn’t waiting for blockchain solutions to feeding cats
  • “Everyone has cats” is false - only 25% of US households own cats

Team Credibility:

  • Zero technical expertise in blockchain, ML, or logistics
  • Petco employment ≠ relevant experience
  • Cat ownership ≠ industry expertise

Funding Ask:

  • $50M pre-money for a Google form is delusional
  • No revenue, no users, no product, no defensible tech

Real Talk: You’re building complex tech for a simple problem. Cats need food, owners want it delivered cheaply and reliably. Focus on logistics, pricing, and customer service instead of blockchain theater

So I guess it can be blunt! Glad to see some authenticity in the model, and that my ideas are not so terrible.

opencode

Another benefit of exploring Kimi K2 is that I needed to find an agnostic harness for the model. I have always been curious about opencode, and this seemed like the perfect excuse to really try it out. Wow! What a nice experience. Leaps ahead of using Claude Code in terms of UX. Why can’t Anthropic make such a polished terminal?

I know all the major labs are making more polished VS Code extensions, or full IDEs, but the most power seems to come from the CLI tools. In particular, I rely on --dangerously-skip-permissions with Claude Code, as it really unlocks the agentic experience. As far as I know, that’s not possible with the VS code extenision for Anthropic.

I’ll have to play with opencode a lot more, and my opinions will likely change, but I thought it was worth noting how delightful it is use. I feel smarter when using it. Almost like a real hacker.