get notified →
Preview

Better tools for your AI agents, one open-source toolbox

Like humans, your AI agents are limited by the quality of their tools. We're building a toolbox that lets them do their best work.

$ bash <(curl https://include.tools/install/<claude-code|claude|cursor|windsurf|pi>)
 toolbox installed

$ toolbox install github.com/include-tools/google
🔐 Opening browser to authorize Google services…
 Connected as [email protected] (personal)

$ // how it'll look when you're in. Get notified of our public launch →
// section 1 — local code mode

Just one super tool.

After installing toolbox, your agent will have one new super tool

One super tool

In most AI platforms, every integration you setup adds tens of thousands of extra tokens to your agents working memory. 
For a human, it would be as if someone forced you to read the manual for every appliance in your household before letting you use the toaster.
If you've installed access to calendar, email, and chat your agent probably has more than 40 large tool definitions cluttering up its working memory. 
Toolbox is different. It's super tool is more like a magic wand. Agents are taught just the right amount

Compressed Tool Info

Toolbox tools are given to the agent in a very concise format that allows you to add little hints for the agent on when to use them as well. If you setup your agent to browse tech news, manage your email, and world domination, it's tool info might look like this.
==
tool.name tool.count
--
hacker_news 6
gmail 12
sharks_with_laser_beams 3 // use in presence of world leaders
==
to figure out what they can do and then when they want to  they can materialize the knoweldge

Progressive Discovery (The Secret Behind Skills)

Progress discovery simply means that an agents working memory only gets filled with more information as it needs it.
Using our toaster metaphor, you probably don't need a manual to use the toaster – but there is that weird button you've never touched because what does it do? If you had our toolbox installed in your brain, the manual would appear, opened to the right page, the moment you wondered.
Agent skills were one of the early champions of this technique to improve agent abilities while not flooding their memory (context) with information the model didn't need.
 how it works.
. It gives your AI agent a better way to access

Let AI's access tools how they were meant to

Toolbox lets your agents access  by coding. Why fight their natural love language. Agents are generally better at coding than tool calling.
"We tried something different: Convert the MCP tools into a TypeScript API, and then ask an LLM to write code that calls that API. The results are striking:
  • We found agents are able to handle many more tools, and more complex tools, when those tools are presented as a TypeScript API rather than directly. Perhaps this is because LLMs have an enormous amount of real-world TypeScript in their training set, but only a small set of contrived examples of tool calls.
  • The approach really shines when an agent needs to string together multiple calls. With the traditional approach, the output of each tool call must feed into the LLM's neural network, just to be copied over to the inputs of the next call, wasting time, energy, and tokens. When the LLM can write code, it can skip all that, and only read back the final results it needs.
In short, LLMs are better at writing code to call MCP than at calling MCP directly." — Cloudflare Engineering Blog
 and interact with the world.  Adding a new tool to your agents super tool easy and quick

Installing tools

Run toolbox install github.com/include-tools/google, follow the prompts, and you will have working Google integration — Gmail, Calendar, Drive, Contacts. Authorized,installed, verified

Tool package installation & verification

Toolbox automatically downloads packages from Github (or other Git repositories). Oncedownloaded

Automatic downloading

Packages are downloaded directly from Github Releases if they have been released as aversioned package file

Small versioned packages

Most packages contain a small amount of TypeScript code and compress into a few kilobytes that can be downloaded in seconds.
The packages themselves are just compressed files with an accompanying manifest which describes what tools are in the package and how to use them.
. If no versions have been released on Github, the package resolver will clone the repository and use the latest commit.
and cached, packages are verified that they have not been tampered with before being loaded.
Package versions are locked so that only you choose when to upgrade.
, and ready for your AI agent to use.
Install a package and those integrations are available through thetyped SDK

The typed SDK.

Every installed tool is exposed to your agent as a TypeScript module with full type signatures. Your agent can see exactly which arguments each function takes, what types they should be, and what the return value looks like — without anyone hand-writing a JSON Schema.
TypeScript signatures aredramatically smaller

The token math.

Most MCP servers expose tools as JSON Schema — verbose, repetitive, and tuned for machine parsing. A typical tool definition in JSON Schema is a few hundred tokens per tool.
TypeScript signatures carry the same information in a fraction of the characters. For a 30-tool stack, that's tens of thousands of tokens saved every single turn.
And those tokens have real costs: you pay them in dollars, in latency, and — worst of all — in context rot. The more tool definitions crowding the context, the worse your agent reasons about the task it's actually trying to do.
than the equivalent JSON Schema, while being more precise.
. No config file editing, no restart, no MCP sysadmin work.
, add your email, calendar, social networks, news sites, and more. All ready for your agent to use in seconds.
The super tool lets your agent combine multiple steps together efficiently

Combining steps saves tokens and mistakes

Real tasks are never one function call. "Find Alex's email about the demo, check my calendar for a free hour, reply with the top two slots" is four or five steps — and most agent setups make the model do each one as a separate round-trip through the LLM.
With include.tools, your agent writes a short script that does all of them ina single pass

One pass vs. many round-trips.

Traditional function calling: agent emits a call → host executes → result goes back into the context → agent decides the next step → emits the next call → host executes → … Each step is a full round-trip through the model.
With a code-execution environment: your agent writes the whole multi-step script at once and it runs in a single pass. The model only sees the final result.
For tasks with five or more steps, this is usually much faster and vastly cheaper. The difference compounds with complexity.
.
, and itcan resume anytime

It doesn't lose its place.

Your agent's working state — variables it computed, files it fetched, intermediate results — they all lives in adurable workspace

Durable workspace.

The workspace is an isolated TypeScript runtime with no access to your real filesystem. Its cell state is stored in aSQLite session store

SQLite-backed session store.

The workspace's cell state lives in a SQLite database. The sandbox has no filesystem access by default, so your real files stay untouched. The store gives you:
  • Atomic cell writes. Each cell commits as a transaction; if the process dies mid-turn, partial writes roll back.
  • Crash-safe resume. Reboot your machine; the next session picks up exactly where the last one left off.
  • Deterministic replay. · planned Every action is acell

    The cells model.

    Under the hood, every agent action is a deterministic cell — internally, the unit of replay, rollback, and (post-launch) branching. This is how the preview is built.
    The frontend analogy is "Jupyter notebook + REPL." The backend is cells: each cell records inputs, outputs, and side effects in a way that lets us re-run it identically or branch from it to explore alternatives.
    Branchable REPL — where an agent can fork a cell chain for a subagent to explore an alternative path — is in flight and will be a post-launch bonus. Durability is shipped today.
    — a user-facing replay surface is planned, not yet shipped.
with per-cell transactions.
Every action your agent takes is written atomically. If the process crashes mid-turn, the turn rolls back. If the machine reboots, the next session picks up exactly where the last one left off.
that survives crashes, restarts, and long sessions. If something fails mid-task, it picks up where it left off instead of starting from scratch.
This matters more than it sounds. Agents that lose their place on every hiccup waste tokens, burn time, and frustrate their users. Persistence changes the math.
when something goes wrong.
--- cell 1
> // alex emailed about a thursday demo — read details
> const inbound = await gmail.read({ google_account: "personal", id: "..." })
< { from: "[email protected]", subject: "thursday demo", ... }
--- cell 2
> // get work-calendar busy blocks over the next 3 days
> const start = new Date(), end = new Date(Date.now() + 3 * 86400000)
> await calendar.freebusy({ google_account: "work", timeMin: start, timeMax: end, calendars: ["[email protected]"] })
! TypeError: timeMin must be an RFC3339 string. Got: object (Date).
--- cell 3
> // right — calendar.freebusy wants RFC3339 strings, not Date objects
> await calendar.freebusy({
    google_account: "work",
    timeMin: start.toISOString(),
    timeMax: end.toISOString(),
    calendars: ["[email protected]"],
  })
< { results: [{ calendarId: "[email protected]", busy: [...] }] }
--- cell 4
> // compute two free hour-long gaps from $last's busy blocks
> const slots = findFreeHours($last.results[0].busy, start, end).slice(0, 2)
> await gmail.send({
    google_account: "work",
    to: inbound.from,
    subject: `Re: ${inbound.subject}`,
    body: `I have ${slots[0]} or ${slots[1]} open — which works?`,
  })
< { sent: true, messageId: "..." }
// section 2 — multi-account built-in

Painless access to multiple accounts.

You don't need to launch multiple MCP servers or wire up multiple tools to reach multiple accounts. As soon as you've authorized a tool package with more than one account, an extra <provider>_account field is automatically added to the tool signatures — so agents can address them by name.

import { gmail } from "@toolbox/sdk";

const inbound = await gmail.read({ google_account: "personal", id: "..." });

await gmail.send({
  google_account: "work",
  to: inbound.from,
  subject: `Re: ${inbound.subject}`,
  body: "Got it. Will do.",
});

// No keys. No tokens. You authorized once with `toolbox auth google`.
Read from one inbox, send from another, in five lines. Multi-account is the plot, not a parameter.
// section 3 — the catalog

Every tool package declares what it can do and what it touches.

Each tool exposes a typed TypeScript manifest. Multi-account is part of the call signature, not a side channel. Network and data-access rules are declared per-tool in the manifest and enforced at the runtime layer — not by hoping the agent behaves.

Packages are distributed on npm under @include-tools/* and installable via toolbox install github.com/include-tools/<name>. Lockfile-pinned, reproducible, check-into-git.

In use today (many more coming) · github.com/include-tools/google-workspace
gmail
Read, send, search, label across multiple accounts
google calendar
Events, free slots, attendees, multi-calendar
drive
Search, read, write across multiple accounts
// section 4 — toolsets

toolset.json — like package.json, but for agents. (and better)

Every project has a package.json that declares its dependencies, pins versions, and makes the build reproducible. Your agent needs the same.

A toolset.json pins which packages your agent can use and at which exact version, and lists the specific tools it exposes — with a companion lockfile that verifies every install. Check it into your repo. Share it with your team. Reproduce environments. Scope per project. Updates only happen when you run the package manager — never silently behind your back.

The toolset.json also configures how your tools work.

Want to limit a tool to just posting on two slack channels, easy. Ask for approval when sending external emails, but free range internally? easy.

// package.json
{
  "name": "my-app",
  "dependencies": {
    "react":  "^18.2.0",
    "lodash": "^4.17.0",
    "zod":    "^3.22.0"
  }
}
// toolset.json
{
  "packages": {
    "github.com/include-tools/google-workspace": "1.2.0"
  },
  "tools": [
    { "tool": "github.com/include-tools/google-workspace/gmail" },
    { "tool": "github.com/include-tools/google-workspace/gcal" }
  ]
}
Left: your project's dependencies. Right: your agent's.

The bytes you installed are the bytes that run — every time. Each tool package ships with a manifest that lists every tool's SHA-256 hash; Toolbox re-hashes the cached bytes on every tool load — not just at install — and refuses to run on mismatch. (Cryptographic signing is planned. Today's guarantee is reproducibility of bytes, not identity of authors.)

// section 5 — why "better"

Trust, layer by layer.

Your agent touches real data — real email, real calendar, real accounts. We designed the trust layer before the catalog, so every tool you install arrives with the same boundaries in place.

None of these are magic. Each one does one specific thing, and together they mean a compromised prompt can't quietly drain your inbox.

@

Multi-account credentials

Authorize multiple accounts of the same service with toolbox auth. Your agent addresses them by name in code — gmail.send({ google_account: "work", … }) — so work and personal never get mixed up at the call site.

No tool tampering

Every tool package ships with a manifest of SHA-256 hashes. Those hashes land in your *.toolset.lock on install, and Toolbox re-hashes the cached bytes on every tool load — not just install, every invocation — refusing to run on mismatch. The code you reviewed once is the code that runs every time, byte-for-byte.

🔒

Locked by default

Your credentials live in an age-encrypted store the Toolbox daemon opens once per daemon start. Unlock with a password or a single OS-level gesture; lock it again when you step away. While locked, no tool can reach your accounts — not your own, and not a compromised one. While unlocked, the daemon hands credentials to the transport layer per-tool — your agent and the notebook never see the raw secret.

Isolated runtime

Tools run in an isolated JavaScript runtime with no filesystem, process, or OS access by default. Credentials are attached at the network layer from an encrypted store outside the runtime — your tool code (and your agent) never sees the raw secret.

×

Deny-by-default network

Every tool declares the hosts it's allowed to reach. Calls outside that allowlist are blocked at the runtime layer. This is the boundary that makes isolation meaningful — a rogue fetch to an attacker-controlled domain simply doesn't go through.

!

Lethal trifecta visibility

Every tool manifest declares whether it exposes your agent to untrusted input, private data, or exfiltration capacity. You see these flags before you install — so you know exactly what kind of trust you're extending, and to which tool.

// section 6 — compatibility

No framework switch required.

include.tools plugs into the agent runtime you already use. One install command — no framework rewrite, no config file archaeology.

claude-code
Claude Desktop
Cursor
Windsurf
Pi
OpenClaw

Works with OpenClaw via Pi-extension, direct integration coming. Works with any harness that can make MCP tool calls.

// section 7 — how a call actually runs

Runs on your machine. With your API key. We never see your data.

Toolbox is a local server that runs on your computer. Your API keys, OAuth credentials, your logs, and your sessions all stored locally. We don't host execution. We don't proxy your tool calls remotely. There is no middleman.

Here's the path every agent super-tool call takes.

When your agent harness launches, a toolbox session process starts (MCP / Pi-extension over stdio). The toolbox session provides the super-tool to your agent harness. When your agent makes a tool call, a persistent sandboxed TypeScript notebook executes it as a cell. Any tool calls invoked in that cell run in a separate sandbox. This ensures agents and tools can only communicate via serialized JS arguments, and lets us adjust the network and filesystem surface for each sandbox separately.

A separate toolbox server process holds your encrypted secrets, unlocking on start with a passphrase and surviving toolbox session process restarts. It attaches credentials at the transport layer, so neither the notebook nor any tool sandbox ever sees a credential. Every outbound tool request is checked against a per-tool deny-by-default host allowlist declared in the tool's manifest.

toolbox server process

toolbox session process

tool.call()

MCP / Pi · stdio

per-tool credentials

response

agent harness
your AI agent

super-tool · notebook
persistent, sandboxed TypeScript REPL

per-tool sandbox
one per tool call · net/FS surface tunable per tool

transport + per-tool allowlist
deny-by-default · declared hosts only

encrypted secrets
unlocked on start (passphrase) · survives session restarts

external service
Gmail · Calendar · …

// waitlist

Get notified when the preview opens.

We're letting developers in as we're ready. Leave your email — we'll reach out when the public launch is live.