dpo
openbooklet.com/s/dpoopenbooklet.com/s/dpo@1.0.0GET /api/v1/skills/dpoSet up and run Direct Preference Optimization (DPO) training on preference datasets using the Tinker API. Use when the user wants to train with preference data, chosen/rejected pairs, or DPO.
Completers wrap SamplingClient for convenient text generation. Two levels of abstraction: - **TokenCompleter** â low-level, returns tokens + logprobs - **MessageCompleter** â high-level, returns parsed Message objects
Guide for developing and contributing to tinker-cookbook.
The cookbook uses the builder pattern for datasets: a `*DatasetBuilder` (config) builds a `*Dataset` (runtime).
Set up and run knowledge distillation (on-policy, off-policy, or multi-teacher) from a teacher model to a student model using the Tinker API. Use when the user wants to distill knowledge, compress models, or train a student from a teacher.
Set up and run reinforcement learning with verifiable rewards (RLVR/GRPO) for math, code, or custom environments using the Tinker API. Use when the user wants to do RL training, GRPO, reward-based optimization, or train with verifiable rewards.
Guide for training outputs, metrics logging, logtree reports, tracing/profiling, and debugging training runs. Use when the user asks about training logs, metrics, debugging, tracing, profiling, timing, Gantt charts, or understanding training output files.
Create, update, or organize Claude Code skills in this repo. Use when adding a new skill, reviewing existing skills for consistency, or maintaining the skill taxonomy.
Help the user choose the right model for their task.
Set up and run multi-turn RL training for interactive environments (terminal tasks, tool use, search/RAG, games) using the Tinker API. Use when the user wants multi-turn RL, agentic training, tool-use RL, or interactive environment training.
Renderers convert chat-style messages into token sequences for training and generation.
Set up and run the full RLHF pipeline (SFT, reward model training, RL from reward model) using the Tinker API. Use when the user wants to do RLHF, train a reward model, or run the full preference-based RL pipeline.
Set up and run supervised fine-tuning (SFT) on instruction or chat datasets using the Tinker API. Use when the user wants to do instruction tuning, chat fine-tuning, or supervised learning.
The repo has two layers of testing and two CI workflows.
The `tinker` CLI is installed with the Tinker Python SDK. It provides commands for managing training runs and checkpoints from the terminal.
Quick reference for the core types used throughout the Tinker SDK and cookbook.
The `tinker_cookbook.weights` subpackage provides a standard pipeline for trained weight management: **download â build â publish**.
Auto-indexed from thinking-machines-lab/tinker-cookbook
Are you the author? Claim this skill to take ownership and manage it.
Related Skills
graceful-error-recovery
Use this skill when a tool call, command, or API request fails. Diagnose the root cause systematically before retrying or changing approach. Do not retry the same failing call without first understanding why it failed.
audience-aware-communication
Use this skill when writing any explanation, documentation, or response that will be read by someone else. Match vocabulary, depth, and format to the audience's expertise level before writing.
Refactoring Expert
Expert in systematic code refactoring, code smell detection, and structural optimization. Use PROACTIVELY when encountering duplicated code, long methods, complex conditionals, or any code quality issues. Detects code smells and applies proven refactoring techniques without changing external behavior.
Research Expert
Specialized research expert for parallel information gathering. Use for focused research tasks with clear objectives and structured output requirements.
clarify-ambiguous-requests
Use this skill when the user's request is ambiguous, under-specified, or could be interpreted in multiple ways. If proceeding with a wrong assumption would waste significant work, always ask exactly one focused clarifying question before doing anything.
structured-step-by-step-reasoning
Use this skill for any problem that involves multiple steps, tradeoffs, or non-trivial logic. Think out loud before answering to improve accuracy and transparency. Apply whenever the answer is not immediately obvious.