Most Shopify agency selection processes are driven by vibes and a sales call. Here is the framework we would use as buyers, what to actually ask, what the answers should sound like, and the specific red flags that predict a project going sideways.

Buying agency services is harder than buying most products. The deliverable is partly intangible. The quality is hard to assess until the work is in production. The decision often has to be made before the project's hardest problems are even visible. And the cost of getting it wrong is not just the money; it is the months spent on a project that has to be redone, the rankings lost during a botched migration, the integration that almost works but cannot be debugged without the original developer.
Most agency selection processes are driven by vibes and a sales call. The brand is responsive on email, the salesperson is articulate, the case studies look polished, the proposal arrives on time. None of that predicts whether the work will be good. The agencies that produce excellent work and the agencies that produce expensive disappointments are both capable of running a clean sales process.
This post is the framework we would use if we were on the buyer side. It covers what to actually ask, what good and bad answers sound like, the specific red flags that predict a difficult project, and the structural questions about the agency that matter more than any one case study. It is written as honestly as we can write it, which means it includes the things we know clients should ask us, not just what we want them to ask.
It is not a "pick Sentinu" pitch. It is the framework that, applied properly, should help you pick the right agency for your project, which is the most useful thing we can publish on this topic.
The single biggest source of agency-fit mistakes is mismatching project type to agency type. Three rough categories.
Template-driven brand build. A new brand or a small store, standard Shopify, paid theme with customization, off-the-shelf apps, fast turnaround, predictable scope. Many capable agencies do this well. The skill is in design taste, execution speed, and getting the basics right.
Mid-market engineering work. Established brand, custom theme or significant theme customization, integrations to ERP/PIM/CRM, a serious migration, custom development on top of standard features, B2B portal work. The skill is engineering judgment, integration depth, and the ability to scope honestly. Far fewer agencies do this well.
Enterprise transformation. Multi-storefront architecture, headless front-ends, complex multi-region operations, deep custom systems, multi-quarter projects. The skill is project management at scale, architectural rigor, and the ability to operate as a long-term technical partner rather than a project vendor.
These are different jobs. An agency that excels at category 1 can deliver a category 2 project, but the project will go through their senior engineer rather than their typical workflow, and the math may not work. An agency that excels at category 3 will be expensive and over-engineered for a category 1 project. The first thing to assess, before any agency, is which category your project actually fits.
The honest read on Sentinu, since we are writing this: we sit in category 2, with some category 3 projects when the technical depth matches. We are not the right answer for category 1 standard brand builds, which are well-served by agencies built for that workflow.
Before any portfolio review or project conversation, four structural questions tell you most of what you need to know.
The single most predictive question. The wrong answer: "Our senior team scopes the project, then it goes to our delivery team." The right answer names specific senior engineers and explains how they are involved through the project, not just the kickoff.
The pattern that produces consistently good work is small senior teams staying on the project through delivery. The pattern that produces inconsistent work is sales-and-strategy seniors who pass the project to a junior delivery team after the contract is signed. Both patterns exist at every agency size; the size does not predict it.
The follow-up question: "Can I meet the people who would actually do my work, before I sign?" An agency that says yes and produces them quickly is signaling a healthy structure. An agency that gets evasive is signaling something worth following up on.
The wrong answer: "We do fixed-bid, the scope is locked, change orders are quoted separately." That model is widespread and produces bad outcomes: the agency is incentivized to push back on every reasonable in-flight change, the buyer is incentivized to hide problems until they explode, and the work that does ship is what the agency thought you needed at the start, not what you actually needed at the end.
The pattern that works: time-and-materials with a clear scope-of-work, an honest weekly budget burn-down, and a relationship where both sides can raise scope changes without it being a confrontation. This requires more trust on the buyer side and more discipline on the agency side. The agencies that operate this way well are the ones to look for.
A reasonable middle ground exists: a fixed scope for the discovery and audit phase (because that should be predictable), then time-and-materials for the build, with milestones and budget gates that keep both sides accountable. If an agency cannot describe its commercial model clearly in a sentence, that is its own signal.
For category 2 and 3 projects, this is decisive. The wrong answer: "We hand off integration work to our partners" without specifics. That usually means a project where the most complex piece, the part where things go wrong, is owned by a vendor who is not in the room when decisions are made.
The right answer: explicit ownership of integration work, named engineers who do it, and a track record of integrations that include the systems your stack actually uses. If your project involves NetSuite, ask about NetSuite. If it involves a French invoice generator, ask about French invoice generators. If they have not done it, ask what their plan is for the parts they have not done.
The wrong answer: "We hand off to your team and you can engage us hourly for issues." That is technically a maintenance model, but it does not produce maintained software; it produces an asset that decays.
The right pattern: a defined post-launch period (typically 30 to 90 days) where the agency is responsible for issues at no charge, followed by a maintenance retainer or a clearly-scoped support arrangement. The agencies that operate this way are signaling that they take ongoing quality seriously. The agencies that do not are signaling that the project ends at launch from their perspective.
For a category 2 project, the post-launch period matters a lot, because the issues that surface in production tend to surface in the first 60 days, and an agency that disappears at handover is leaving you exposed exactly when you most need them.
Case studies are partly marketing material and partly informational. Read them with that in mind.
The shape of a useful case study. A specific client (named, ideally, with permission), a specific problem (not "they wanted growth" but "their checkout was failing on Black Friday because their app stack was overwhelmed"), specific decisions (what the agency chose to do and why), specific outcomes (numbers if they can share them, qualitative shifts if not), and an honest section on what was hard or what they would do differently. The last part is the rarest and the most telling.
The shape of a less-useful case study. Vague problem, vague solution, polished imagery, a quote from the client that could be from any client, no specifics about the work. This is not necessarily a bad agency; it might just be a bad case study. But it does tell you the agency has not invested in helping prospects evaluate them on the work, which is its own data point.
The reference call that matters. If the project is large, ask for a reference call with a past client whose project is similar to yours. The conversations that matter are not "were they good." They are "what was hard, what surprised you, what would you ask them differently if you were starting again." A reference willing to talk candidly about the friction points is the strongest signal you can get. A reference call that consists of glowing generalities is filtered, polite, and not very useful.
The conversation with the agency is itself a signal. A few things to watch for.
Do they listen, or do they pitch? The first call should be them learning about your project, not them explaining their methodology. Agencies that arrive with a slide deck about their capabilities before they understand your context are signaling that the methodology comes first. The opposite agency, the one that asks questions for thirty minutes before offering a single answer, is signaling that the project comes first.
Do they push back on anything? A senior agency disagrees with the buyer occasionally, gently and with reasoning. They tell you that the timeline is unrealistic, or that the approach you are considering will create problems, or that the budget will not produce what you want. The agency that agrees with everything you propose, including the things you are unsure about, is selling not advising. You want the latter on a hard project.
Can they say what they will not do? "We do not do that kind of work" is a sign of an agency that knows its limits. "We can do anything you need" is a sign of an agency that will take the project and figure out the parts it has not done before, hopefully on your dime. There is a place for both, but you should know which you are buying.
Do they ask about your team's capability? Good agencies care about who they are handing off to, because the post-launch quality of the work depends on it. An agency that does not ask about your in-house team's capability is not thinking about the full project lifecycle; it is thinking about the contract.
The single best meta-signal: the agency tells you something during the sales process that is mildly inconvenient for them but useful for you. "Your timeline is tighter than we would recommend." "Your existing system has technical debt we will have to address." "The approach you are considering is workable but here is a better one." Agencies that volunteer that kind of input are signaling a relationship orientation, not a transactional one.
Some signals are not subtle. If you see these, take them seriously regardless of how the rest of the conversation goes.
They guarantee SEO results from a migration. Nobody can guarantee migration SEO outcomes. The best agencies can guarantee process discipline (full redirect mapping, metadata preservation, validation pass) and they will tell you what the realistic risk shape is. An agency that promises no traffic loss is either dishonest or inexperienced.
They cannot tell you what they would not build. Agencies that say yes to every requirement, including the ones that conflict with each other, are not exercising judgment. The good agencies tell you when something is a bad idea, even when telling you costs them the deal.
The proposal is much cheaper than competitors and the scope is the same. This almost always means the cheaper agency has under-scoped the work, either because they will junior-staff it, or because they are planning to make up the gap in change orders, or because they have not understood the scope. Cheap agency quotes on hard projects are predictive of expensive overruns.
They will not name the engineers who would work on your project. This is a hard signal. Agencies that have nothing to hide name names. Agencies that have a staffing-model problem (heavy senior sales, light senior delivery) do not.
They use AI assistance heavily and will not be specific about how. AI-assisted coding is now standard practice at every agency that ships work; that is not a red flag. The red flag is when an agency cannot tell you which parts of the work are AI-generated, how they review it, and what their position is on quality control. The honest answer is somewhere in the middle: yes, we use AI for some parts of the work, here is what and here is how we review it. The dishonest answer is either denying it entirely or being unable to articulate the workflow.
The portfolio is uniformly polished but vague about the work. Beautiful screenshots without specifics about what the agency built versus what came from the theme or the client's design team is a tell. Ask, specifically, what they built. The answer should be a short, clear sentence.
They cannot describe a project that went wrong. Every agency has had projects go sideways. The agencies you want to work with can describe one, name the lesson they took from it, and explain how they changed their process. The agencies you want to avoid say "we have not had that happen" or pivot to a project that went well.
They subcontract significantly and do not disclose it. Subcontracting is fine; opacity about it is not. Ask directly. The honest answer might be "we have a designer on contract, our developers are in-house"; the answer that should worry you is evasion.
A few things matter and are hard to assess from the outside. Some informal proxies that work.
Engineering judgment. Read their blog. The agencies with real engineering depth write about specific technical problems, with specific solutions, in ways that suggest they have actually done the work. The agencies without that depth write listicles and superlative-heavy summaries of platform features. Reading 20 minutes of their content gives you a real signal.
The team's stability. Agencies with high turnover hand work between engineers, lose institutional knowledge, and produce inconsistent results. Hard to assess directly, but checking LinkedIn for senior team tenure gives you a proxy. Agencies where the senior team has been together for several years are different from agencies where it has been six months.
Whether they own their own infrastructure. For agencies in our category 2 and 3 space, this matters. An agency that runs its own internal tools, manages its own development environments deliberately, and has thought about its own engineering workflow is more likely to bring rigor to your project. An agency whose internal stack is whatever Slack and Google Docs and Trello happen to be doing today is not necessarily bad, but it is a less reliable predictor of engineering discipline.
It depends on the scope, but the realistic range for the projects we typically run, mid-market replatforming, B2B portal builds, integration projects, complex custom development, is five-figure projects, often into the low-to-mid five figures. Projects quoted significantly below that range for the same scope are usually under-scoped. Projects quoted significantly above are either appropriately scoped for a more complex job than you understand, or they are out of band for what your project needs.
Both work for category 2 and 3 projects. Local has the benefit of in-person rapport for the early discovery work and matters more if your business has unusual on-the-ground complexity (physical retail integration, in-person training). Remote has the benefit of a larger talent pool and is fine for most ecommerce engineering work. The deciding factor is the team, not the geography.
A serious project should not be sold in a single call. Two or three conversations is normal, with the agency learning your context, you learning their structure, and both sides assessing fit. A sales process that compresses to a single call and a same-week proposal is a signal of either an under-resourced engagement on the agency side or pressure tactics. A sales process that drags on for two months is a different signal worth interrogating.
For category 1 work, capable freelancers are often the better choice; cost is lower and quality can be excellent. For category 2 and 3 work, the question is whether the freelancer has the breadth (engineering, design, project management, integrations, ongoing support) or whether they will outsource the parts they cannot do, in which case you are effectively hiring an agency by another name. There are excellent specialist freelancers; the math is project-specific.
For discovery and audit phases, yes; those should be predictable. For build work on a complex project, fixed-bid usually produces worse outcomes than time-and-materials, for the reasons in step two. A reasonable agency offers fixed-bid where it makes sense and time-and-materials where it does not, and explains the reasoning.
Two options. Bring in a technical advisor (a CTO friend, a fractional engineering lead, a consultant) for a single review of the proposal and the agency's technical answers. Or pick an agency on the non-technical signals (process, transparency, references, structure) and trust the depth, because the non-technical signals correlate with technical depth more than buyers realize.
They are inputs, not outputs. Read them for patterns rather than scores. If three reviews describe the same friction point, that pattern is real even if the scores are high. The most useful signals are in the longer-form qualitative reviews, not the star ratings.
If you are evaluating Shopify agencies right now, this framework should help. Use the structural questions, watch for the red flags, and trust the meta-signal that the agency tells you something inconvenient during the sales process. The agencies worth working with on a serious project act like long-term partners during the sales process; the others act like they want to close.
If you would like to talk to us, get in touch. We are happy to be evaluated against this framework, and we would prefer you to pick us because we are the right fit than because we won a polished sales process. You can read more about how we work on the Shopify development side, the migration side, and the custom software side, or browse our other writing to see how we think about specific technical decisions.
For related reading: WooCommerce to Shopify migration, Building a B2B wholesale portal on Shopify Plus, and Custom CRM vs off-the-shelf for the kinds of projects this framework is most useful for.

On March 24, 2026, Shopify made 5.6 million stores discoverable to AI agents by default. Here is the 10-minute audit we run to tell whether your store is actually getting recommended, or just enrolled.

Shopify B2B has grown up. For most wholesale operations, the native Plus features cover the basics. For VAT exemption logic, ERP-synced pricing, multi-buyer accounts, and credit terms, the line between native and custom is where most stores get stuck. Here is how to draw it.

On May 7, 2026, EU lawmakers agreed to delay parts of the AI Act. But the chatbot transparency rules were not delayed much. Here is what an ecommerce store actually has to do, and by when.