·Dictum Team

How to choose an AI medical scribe: a buyer's evaluation guide

buying-guideevaluationmedical-scribeclinical-workflow

Choosing an AI medical scribe comes down to five things: does it produce usable notes for your specialty, is it HIPAA-compliant, does it fit your workflow, can you afford it, and does the vendor handle your data responsibly? Everything else is secondary.

This guide walks through each evaluation criterion with specific questions to ask, red flags to watch for, and a comprehensive checklist you can use during vendor demos and trial periods.

Define your workflow first

Before evaluating any product, clarify how you document today and where the friction lives:

  • When do you chart? During the encounter, immediately after, or at the end of the day?
  • What output do you need? SOAP notes, HPI-only, full encounter notes, after-visit summaries, referral letters?
  • Where do you work? Single office, multiple locations, telemedicine, hospital rounding, or a mix?
  • What's your encounter volume? 10 patients/day or 35? High-volume practices benefit more from automation.
  • What devices do you use? Phone, tablet, desktop, or all three?

An AI scribe that requires a desktop app won't help if you chart from your phone between patients. One that only produces SOAP notes won't help if you primarily need procedure notes or after-visit summaries.

The best fit is the tool that matches your existing workflow with minimal behavior change.

Check documentation outputs

Not all AI scribes produce the same types of documentation. Evaluate:

Note types supported:

  • SOAP notes
  • Free-form encounter notes
  • After-visit summaries (patient-facing)
  • Referral letters
  • Procedure notes
  • Intake summaries

Template flexibility: Can you customize note templates for your specialty and personal style? A dermatologist needs different output fields than a psychiatrist. Platforms like Dictum offer custom clinical templates that adapt the output structure to your specific needs.

Capture modes:

  • Ambient capture (records the natural conversation)
  • Post-visit dictation (you summarize after the patient leaves)
  • Both

Output quality signals: During your trial, assess these on a 1–5 scale for 10+ encounters:

  • Completeness: all discussed items appear in the note
  • Accuracy: no fabricated findings or incorrect attributions
  • Structure: information in the correct note sections
  • Readability: natural clinical language, not robotic transcription
  • Edit time: minutes needed to finalize each note

If you're editing more than 20% of the content on a routine encounter after your first week, the tool may not be well-suited for your use case.

Evaluate security and HIPAA posture

This is non-negotiable for any tool processing patient health information. Minimum requirements:

Must-have:

  • Signed Business Associate Agreement (BAA)
  • Encryption in transit (TLS 1.2+) and at rest (AES-256)
  • Clear data retention policy with defined deletion timelines
  • No use of patient data for model training without explicit consent
  • Access controls and audit logging

Strong signals:

  • SOC 2 Type II certification
  • HITRUST CSF certification
  • Regular third-party penetration testing
  • On-device processing options for sensitive environments
  • Automatic audio deletion after note generation

Red flags:

  • Won't sign a BAA
  • Vague answers about where audio is processed
  • "We use your data to improve our models" without opt-out
  • No documented security practices on their website

Review the vendor's security and compliance documentation before signing up. If they don't have a public security page, that's a concern.

Review pricing

AI scribe pricing models vary. Common structures:

| Model | Typical range | Watch out for | |-------|---------------|---------------| | Monthly per-clinician | $99–$399/month | Feature tiers that lock important capabilities behind expensive plans | | Annual per-clinician | $79–$299/month (billed annually) | Cancellation terms if the product doesn't work out | | Per-encounter | $2–$8 per note | Unpredictable costs at high volumes | | Enterprise | Custom pricing | Long contracts, minimum seat counts |

Questions to ask about pricing:

  • Are all note types included, or do some cost extra?
  • Is there a limit on encounters per month?
  • Do custom templates or specialty features cost more?
  • What happens if I need to cancel mid-contract?
  • Are there discounts for group practices or multiple providers?

Compare the annual cost against the value of time saved. If you save 10 minutes per encounter across 20 patients/day, that's over 3 hours daily. What's that time worth to you or your practice?

See Dictum's pricing for a transparent breakdown with no hidden fees.

Test specialty fit

Generic AI scribes struggle with specialty-specific documentation. A platform trained primarily on primary care conversations may produce poor output for:

  • Psychiatric mental status exams
  • Orthopedic range-of-motion assessments
  • Dermatology lesion descriptions
  • Cardiology murmur characterizations
  • OB/GYN prenatal visit documentation
  • Pediatric developmental assessments

How to evaluate specialty fit:

  1. Ask the vendor which specialties they explicitly support
  2. Request sample outputs for your specialty (or use your trial for this)
  3. Test with your most common encounter type AND your most complex one
  4. Check whether templates exist for your specialty's documentation patterns
  5. Ask if the model has been trained on specialty-specific terminology

A platform that works well for family medicine may not handle the structured nuance of a psychiatric evaluation. Test before you buy.

Ask about data retention and model training

These questions often get overlooked, but they matter:

Data retention:

  • How long is encounter audio stored?
  • How long are generated notes stored on the platform?
  • Can I export all my data if I leave?
  • Is data automatically deleted after a defined period?
  • Can I manually delete specific encounters?

Model training:

  • Is my patient data used to train or fine-tune AI models?
  • If yes, is it anonymized? How?
  • Can I opt out of data being used for training?
  • Are models trained on general datasets or on my practice's specific data?
  • Where does model inference happen—cloud or on-device?

The responsible answer from a vendor: audio is deleted shortly after note generation, patient data is never used for model training without explicit opt-in, and you can export or delete your data at any time.

Buyer evaluation checklist

Use this checklist during demos and trial periods. Score each item as Pass / Partial / Fail:

| # | Evaluation criterion | Questions to ask | |---|---------------------|------------------| | 1 | Produces usable notes for my specialty | Is my primary encounter type well-supported? Do I need to rewrite large portions? | | 2 | Supports my preferred capture mode | Does it offer ambient capture, dictation, or both? | | 3 | Custom templates available | Can I modify note structure, section headings, and required fields? | | 4 | BAA signed and HIPAA controls documented | Will they provide a signed BAA before I start a trial? | | 5 | Clear data retention and deletion policy | When is audio deleted? Can I trigger manual deletion? | | 6 | No patient data used for model training | Is this the default, or do I need to opt out? | | 7 | Works on my devices and in my environment | Phone, tablet, desktop? Online and offline? | | 8 | Pricing is predictable and transparent | No hidden fees, encounter caps, or feature lockouts? | | 9 | EHR export works for my system | Copy-paste, direct integration, or FHIR export? | | 10 | Offline capability if needed | Can I record and generate notes without internet? | | 11 | Multi-note-type support | SOAP, AVS, referral letters, procedure notes? | | 12 | Reasonable trial period | At least 7 days or 10+ encounters to evaluate properly? | | 13 | Responsive support for clinical questions | Can I reach a human when the tool doesn't behave as expected? | | 14 | Audit trail and note versioning | Can I see what the AI generated vs what I edited? | | 15 | Vendor stability and track record | How long have they been operating? Are they funded? Do they have clinical advisors? |

Print this out or keep it open during your evaluation. A platform that passes all 15 is rare—prioritize items 1–6 as non-negotiable, and treat 7–15 as differentiators.

Comparing your options

If you're evaluating multiple platforms, our comparison of the best AI medical scribes covers the major players side by side across features, pricing, security, and specialty support.

What Dictum offers

Dictum is built for the evaluation criteria above:

  • Ambient capture and post-visit dictation in one app
  • Structured SOAP notes, after-visit summaries, and referral letters
  • Custom clinical templates by specialty
  • HIPAA-compliant with automatic audio deletion
  • Offline-capable—record without internet
  • Transparent monthly pricing, no per-encounter fees
  • Works on iOS with multi-device support

Clinicians should review AI-generated documentation before adding it to the medical record and should use Dictum in accordance with their organization's policies and applicable laws.

Start your evaluation

The best way to evaluate any AI scribe is to use it with real encounters. Abstract feature lists don't tell you whether the output matches your clinical thinking.

Start a free Dictum trial → and run it through 10–15 encounters before making your decision.