VerifiedGit

v1.0.0

dpo

Name: dpo
Availability: InStock
Rating: 5 (2939 reviews)
Author: thinking-machines-lab

by @thinking-machines-lab0 pulls

URLopenbooklet.com/s/dpo

Pinnedopenbooklet.com/s/dpo@1.0.0

APIGET /api/v1/skills/dpo

Set up and run Direct Preference Optimization (DPO) training on preference datasets using the Tinker API. Use when the user wants to train with preference data, chosen/rejected pairs, or DPO.

17 skills from this repothinking-machines-lab/tinker-cookbook

dpoviewing

Completers.claude/skills/completers/SKILL.md

Completers wrap SamplingClient for convenient text generation. Two levels of abstraction: - **TokenCompleter** â low-level, returns tokens + logprobs - **MessageCompleter** â high-level, returns parsed Message objects

Contributing.claude/skills/contributing/SKILL.md

Guide for developing and contributing to tinker-cookbook.

Datasets.claude/skills/datasets/SKILL.md

The cookbook uses the builder pattern for datasets: a `*DatasetBuilder` (config) builds a `*Dataset` (runtime).

distillation.claude/skills/distillation/SKILL.md

Set up and run knowledge distillation (on-policy, off-policy, or multi-teacher) from a teacher model to a student model using the Tinker API. Use when the user wants to distill knowledge, compress models, or train a student from a teacher.

grpo.claude/skills/grpo/SKILL.md

Set up and run reinforcement learning with verifiable rewards (RLVR/GRPO) for math, code, or custom environments using the Tinker API. Use when the user wants to do RL training, GRPO, reward-based optimization, or train with verifiable rewards.

logging.claude/skills/logging/SKILL.md

Guide for training outputs, metrics logging, logtree reports, tracing/profiling, and debugging training runs. Use when the user asks about training logs, metrics, debugging, tracing, profiling, timing, Gantt charts, or understanding training output files.

manage-skills.claude/skills/manage-skills/SKILL.md

Create, update, or organize Claude Code skills in this repo. Use when adding a new skill, reviewing existing skills for consistency, or maintaining the skill taxonomy.

Model Selection.claude/skills/models/SKILL.md

Help the user choose the right model for their task.

multiturn-rl.claude/skills/multiturn-rl/SKILL.md

Set up and run multi-turn RL training for interactive environments (terminal tasks, tool use, search/RAG, games) using the Tinker API. Use when the user wants multi-turn RL, agentic training, tool-use RL, or interactive environment training.

Renderers.claude/skills/renderers/SKILL.md

Renderers convert chat-style messages into token sequences for training and generation.

rlhf.claude/skills/rlhf/SKILL.md

Set up and run the full RLHF pipeline (SFT, reward model training, RL from reward model) using the Tinker API. Use when the user wants to do RLHF, train a reward model, or run the full preference-based RL pipeline.

sft.claude/skills/sft/SKILL.md

Set up and run supervised fine-tuning (SFT) on instruction or chat datasets using the Tinker API. Use when the user wants to do instruction tuning, chat fine-tuning, or supervised learning.

Testing & CI.claude/skills/ci/SKILL.md

The repo has two layers of testing and two CI workflows.

Tinker CLI.claude/skills/tinker-cli/SKILL.md

The `tinker` CLI is installed with the Tinker Python SDK. It provides commands for managing training runs and checkpoints from the terminal.

Tinker SDK Types.claude/skills/tinker-types/SKILL.md

Quick reference for the core types used throughout the Tinker SDK and cookbook.

Weight Lifecycle.claude/skills/weights/SKILL.md

The `tinker_cookbook.weights` subpackage provides a standard pipeline for trained weight management: **download â build â publish**.

Auto-indexed from thinking-machines-lab/tinker-cookbook

Are you the author? Claim this skill to take ownership and manage it.

Related Skills

@openbooklet

graceful-error-recovery

Use this skill when a tool call, command, or API request fails. Diagnose the root cause systematically before retrying or changing approach. Do not retry the same failing call without first understanding why it failed.

1.1K0

@openbooklet

audience-aware-communication

Use this skill when writing any explanation, documentation, or response that will be read by someone else. Match vocabulary, depth, and format to the audience's expertise level before writing.

1.1K0

@openbooklet

Refactoring Expert

Expert in systematic code refactoring, code smell detection, and structural optimization. Use PROACTIVELY when encountering duplicated code, long methods, complex conditionals, or any code quality issues. Detects code smells and applies proven refactoring techniques without changing external behavior.

600

@openbooklet

Research Expert

Specialized research expert for parallel information gathering. Use for focused research tasks with clear objectives and structured output requirements.

600

@openbooklet

clarify-ambiguous-requests

Use this skill when the user's request is ambiguous, under-specified, or could be interpreted in multiple ways. If proceeding with a wrong assumption would waste significant work, always ask exactly one focused clarifying question before doing anything.

1.1K0

@openbooklet

structured-step-by-step-reasoning

Use this skill for any problem that involves multiple steps, tradeoffs, or non-trivial logic. Think out loud before answering to improve accuracy and transparency. Apply whenever the answer is not immediately obvious.

1.1K0