Posts

Showing posts from 2025

Tool calling with Moonshot's Kimi K2 mode

Moonshot's new Kimi 2  model is extremely inexpensive to use. Moonshot AI (月之暗面) is a prominent Chinese artificial intelligence startup focused on developing large language models with the long-term goal of achieving Artificial General Intelligence (AGI). The company gained significant recognition for its Kimi chatbot, which pioneered an exceptionally large context window capable of processing up to two million Chinese characters in a single prompt. Backed by major investors like Alibaba, Moonshot AI has quickly become one of China's most valuable AI unicorns and a key competitor in the global AI race. The release of Kimi 2 last week has been referred to as "another DeepSeek moment." You need a Moonshot API access key from  https://platform.moonshot.ai/console/account The direct API pricing from Moonshot AI is approximately: Input Tokens:  ~$0.60 per 1 million tokens Output Tokens:  ~$2.50 per 1 million tokens Several inference providers in the USA are now supporting...

Google Gemini Batch Mode API with a 50% cost reduction: a game changer?

 I noticed on X this morning that Google dropped a new batch API with a 50% price cut. I use gemini-2.5-flash for speed and low cost and being able to batch large numbers of requests in a JSONL file (JSON where each line is a single legal JSON expression) seems like a big deal to me. Gemini Batch API Docs I have been a little negative on Hacker News and X recently about the energy costs vs. value from LLM use and it seems like Google is striking a good middle ground in cost and environmental impact. Automating NLP and other workflows seems fairly simple: automate writing pipeline requests to a JSONL file, automate sending requests, periodically polling for the results being complete, auto-download and use results in your workflows.

So much fun: recreating 1970s text adventure games using LLMs, but better

In the late 1970s, I worked long hours on a text based adventure game called land of the dwarf for the Apple II computer. My game was written in Apple Basic and I gave it away for free. I wrote it using a huge sheet of paper, drawing a transition network diagram with bubbles, locations and action codes for things that could be done and arcs between the bubbles being the ability to move from one location to another. Yesterday I was really in the mood to do something fun. We didn’t have anything going on for Fourth of July celebrations until an outdoor symphony in the early afternoon, so I sat down in the morning with Google‘s excellent Gemini CLI coding agent and I described what I wanted, which was to be able to input a story context as a text input file and then use an LLM in a conversation loop and continuously provide you with the text based adventure program experience. I was really surprised how well it turned out. Fun! The generated adventure game code uses a local model running ...

AI dominance: US vs. China and the rest or the world

I would like the USA to not lose the ‘AI race’ but I don’t want other countries to lose either. Ensuring continued US leadership in AI without constraining global progress necessitates a dual-pronged strategy that couples strategic national R&D investments with adherence to open international standards. On the technical front, integrating edge computing, specialized AI silicon, and advanced hardware-software co-design with standardized protocols for interoperability enables US institutions to maintain a competitive edge. Simultaneously, promoting collaborative frameworks for data sharing, federated learning, and algorithmic auditing mitigates risks of siloed innovation, ensuring that breakthroughs in efficiency, low-latency inference, and energy-optimized training regimes benefit a global network of research and industry stakeholders. In the USA I would like to see major resources dedicated to both open models such as the ones from Meta, as well as the use of French Mistral open mo...

AI update: The new Deepseek-R1 reasoning language model, Bytedance's Trae IDE, and my new book

 I spent a few days experimenting with Cursor last week. Bytedance's Trae IDE is very similar and is currently free to use with Claude Sonnet 3.5 and GPT-4o:  https://www.trae.ai/home  I would like to use Trae with my own API accounts but currently Bytedance is paying for LLM costs. I have been experimenting with the qwen2.5 and qwen2.5-coder models that easily run on my M2Pro 32G Mac. For reasoning I have been going back to using OpenAI O1 and Claude Sonnet, but after my preliminary tests with Deepseek-R1, I feel like I can do most everything now on my personal computer. I am using:  ollama run deepseek-r1:32b I recently published my new book “ Ollama in Action: Building Safe, Private AI with LLMs, Function Calling and Agents ” that can be read free online at  https://leanpub.com/ollama/read