Most LinkedIn outreach lives in the dark. People rewrite messages on vibes and never measure if the new one beat the old one. A 30-day A/B framework changes that. The math is simple, the discipline is the hard part.
The four rules
- One variable at a time. If you change the hook and the CTA, you cannot tell which one moved the number.
- Minimum 100 messages per variant. Below that, the noise wins.
- Two-week cycles. Long enough to hit volume, short enough to compound.
- Kill criteria up front. Decide before the test what reply-rate gap kills the loser. Otherwise hope wins.
What is worth testing
| Variable | Typical lift | Test priority |
|---|---|---|
| First sentence (preview line) | +30 to +50% | #1 |
| CTA wording (question vs link) | +15 to +25% | #2 |
| Connection-request note vs no note | +10 to +20% | #3 |
| Personalization depth | +10 to +15% | #4 |
| Day-of-week sending | +5 to +10% | #5 |
| Length (under vs over 600 chars) | +5 to +10% | #5 |
Start at the top of the table. The first sentence is the only thing shown in the LinkedIn preview, so it determines open rate. Open rate is the bottleneck. Everything else compounds on top of that.
Three test cards to run this month
Test 1 (Week 1-2): First sentence . Specific vs Generic
Variant A: "Saw your post on the cost of switching CRMs."
Variant B: "Hope you are doing well."
Sample: 100 per variant. Metric: reply rate within 7 days. Kill criteria: variant B always loses. The point is to measure the magnitude so you stop writing "Hope you are doing well."
Test 2 (Week 3-4): CTA . Question vs Demo
Variant A: "Would a 15-min walkthrough help?"
Variant B: "How are you handling [specific pain] right now?"
Sample: 100 per variant. Metric: reply rate. Question-CTA usually wins because the prospect has zero friction. Demo-CTA wins when your relationship is already warm.
Test 3 (Month 2 onward): Personalization depth
Variant A: First line references their most recent LinkedIn post.
Variant B: First line references their company's most recent funding round or product launch.
Personal-post often beats company-news, but it costs more research time per message. The test tells you the cost-benefit at your volume.
How to measure cleanly
Tag each variant in your campaign tool. If you use Leadsforlinked Outreach Diamond, each campaign step has a split-test toggle and the analytics surface reply rate per variant. If you use a separate tool, export both legs to CSV and compute reply rate as replies / messages_sent. Keep it simple.
Sample size matters more than significance testing. At 100 per variant, a 5-percentage-point gap is real. A 1-point gap is noise.
What not to test
Tiny copy edits ("Hi" vs "Hey"). Color of the connection-request button. Adding emojis. These have small effects and cost cycles. Spend test budget where the lift is double-digit.
Sources & further reading
- HBR . The Surprising Power of Online Experiments . the canonical case for A/B disciplines.
- LinkedIn . The buying experience report . context on what buyers actually respond to.
- Connection request copy that converts . internal companion piece on message-level craft.
Frequently asked questions
How many messages do I need per variant?
Minimum 100. For high-volume teams (1,000+ messages per month), aim for 250.
What is the single highest-leverage variable?
The first sentence. It is the only thing shown in the LinkedIn preview. Opens move 30-50% on a strong first line.
How long should each test run?
Two weeks. Long enough to hit sample size, short enough to compound.