// File: src/content/docs/remove-duplicates-forms/getting-started/first-scan.md // Location: src/content/docs/remove-duplicates-forms/getting-started/ // Purpose: Detailed guide for running the first duplicate scan


title: "Your First Duplicate Scan" description: "Learn how to run your first duplicate scan and understand the results in Remove Duplicates for Google Forms" order: 3 lastUpdated: "2025-01-20" readingTime: 7 keywords: ["first scan", "duplicate detection", "google forms", "scan results", "data cleaning"]

Your First Duplicate Scan

This comprehensive guide walks you through running your first duplicate scan, understanding the results, and taking action to clean your form data.

Understanding What Happens During a Scan

The Scanning Process

When you initiate a scan, Remove Duplicates:

  1. Reads your form structure - Analyzes all form fields and their types
  2. Loads response data - Retrieves all form responses from the linked spreadsheet
  3. Applies detection logic - Compares responses based on selected fields
  4. Groups duplicates - Organizes duplicate responses by matching values
  5. Displays results - Shows actual duplicate values with full context

What Makes a Response "Duplicate"?

A duplicate is detected when two or more responses have:

  • Identical values in the selected detection fields
  • Similar values when fuzzy matching is enabled (85% similarity)
  • Same normalized data (e.g., "[email protected]" = "[email protected]")

Preparing for Your First Scan

1. Check Your Form Status

Before scanning, verify:

✅ Form has responses: At least 2 responses needed
✅ Spreadsheet is linked: Required for cleaning operations
✅ Add-on is open: Extensions → Remove Duplicates → Find & Delete

2. Understand Your Data

Consider these questions:

  • What type of duplicates am I expecting? (repeat submissions, test entries, etc.)
  • Which fields uniquely identify a respondent? (email, ID number, name)
  • Do I want the first or latest submission from duplicates?

3. Response Volume Guidelines

| Responses | Scan Time | Performance | |-----------|-----------|-------------| | 1-100 | Instant | Immediate results | | 100-1,000 | 2-3 seconds | Fast processing | | 1,000-5,000 | 5-10 seconds | Normal processing | | 5,000-10,000 | 15-20 seconds | Batch processing | | 10,000+ | 20-45 seconds | Optimized batching |

Method 1: Smart Auto-Detection (Recommended)

When to Use Smart Detection

Smart detection is ideal for:

  • ✅ Standard contact forms
  • ✅ Event registrations
  • ✅ Newsletter signups
  • ✅ Survey responses
  • ✅ First-time users

Step-by-Step Smart Scan

  1. Open the add-on

    Extensions → 🧹 Remove Duplicates in Forms → 👉 Find & Delete duplicates
    
  2. Stay on Smart Detect tab (default)

    • You'll see a purple icon and description
    • No configuration needed
  3. Click "✨ Auto-Detect Duplicates"

    • Button changes to "⏳ Detecting..."
    • Progress shown in loading state
  4. Wait for analysis (2-5 seconds)

    • Add-on analyzes your form structure
    • Identifies duplicate-prone fields
    • Runs the detection algorithm
  5. Review detected fields

    • After scan, you'll see which fields were checked
    • Common auto-detected fields:
      • Email addresses
      • Phone numbers
      • Names (first, last, full)
      • ID numbers
      • Student/Employee IDs

Understanding Smart Detection Logic

The add-on prioritizes fields in this order:

  1. Priority 1: Email fields - Most reliable unique identifier
  2. Priority 2: ID fields - Student ID, Employee ID, etc.
  3. Priority 3: Name fields - Full name, First + Last name
  4. Priority 4: Phone fields - Mobile, telephone numbers
  5. Priority 5: Required fields - Any required text field

Maximum fields selected: 3 (to balance accuracy and performance)

Method 2: Manual Field Selection

When to Use Manual Selection

Manual selection is better for:

  • ✅ Custom business logic
  • ✅ Specific field combinations
  • ✅ Non-standard forms
  • ✅ Partial duplicate checking
  • ✅ Advanced users

Step-by-Step Manual Scan

  1. Switch to Manual Select tab

    Click "🎯 Manual Select" tab
    
  2. Open configuration drawer

    Click "🎯 Select Fields & Options"
    
  3. Choose your fields (in the 420px drawer)

    Field Selection Interface:

    📍 SELECT FIELDS TO CHECK
    ☐ Email Address        [REC]
    ☐ Full Name           [REC]
    ☐ Phone Number
    ☐ Student ID
    ☐ Custom Field 1
    
    • Check boxes next to fields to scan
    • [REC] badge = Recommended field
    • Selected fields get blue background
  4. Configure detection options

    ⚙️ DETECTION OPTIONS:

    • ☐ Case sensitive matching

    • ☐ Fuzzy matching

      • When checked: "Jon Smith" ≈ "John Smith" (similar)
      • When unchecked: Only exact matches count
  5. Choose keep strategy

    📍 KEEP STRATEGY:

    • ◉ Keep first response (original)

      • Preserves the earliest submission
      • Best for: Preventing repeat submissions
    • ○ Keep latest response (most recent)

      • Preserves the newest submission
      • Best for: Getting updated information
  6. Click "🔍 Find Duplicates"

    • Drawer closes automatically
    • Scan begins with your configuration

Understanding Scan Results

Results Overview

After scanning completes, you'll see:

Results Summary Bar:
[12 to remove] [88 to keep] [✅ Ready]

This means:

  • 12 duplicate responses will be removed
  • 88 unique responses will be kept
  • Ready for you to take action

Duplicate Groups Structure

Each duplicate group shows:

┌─────────────────────────────────────────┐
│ Email: [email protected]  [3 duplicates] │ ← Click to expand
└─────────────────────────────────────────┘

When expanded:

[email protected]
  [ORIGINAL] 2024-01-15 10:30 AM
  Name: John Smith
  Phone: 555-0123
  Message: Original submission

☐ [email protected]
  2024-01-16 2:15 PM
  Name: John Smith
  Phone: 555-0123
  Message: Duplicate submission

☐ [email protected]
  2024-01-17 9:45 AM
  Name: John Smith
  Phone: 555-0124
  Message: Another duplicate

Reading Response Details

Each response card shows:

  1. Checkbox state

    • ☑ Checked = Will be KEPT
    • ☐ Unchecked = Will be REMOVED
  2. Primary identifier

    • The actual value that matched (e.g., email address)
  3. Timestamp

    • Exact submission date and time
  4. [ORIGINAL] badge

    • Green background
    • Indicates first occurrence
    • Usually checked by default with "Keep First" strategy
  5. Full response data

    • All form fields and their answers
    • Helps you verify it's truly a duplicate

Customizing Selections

You can manually adjust which responses to keep:

  1. Click any checkbox to toggle keep/remove
  2. Check multiple responses to keep several versions
  3. Uncheck all to remove entire group
  4. Mixed selection for custom logic

Real-time updates: The summary bar updates as you change selections

Interpreting Different Scenarios

Scenario 1: Perfect Duplicates

Email: [email protected] [2 duplicates]
├─ ☑ [ORIGINAL] All fields identical
└─ ☐ Exact copy of original

Action: Keep original, remove duplicate

Scenario 2: Updated Information

Email: [email protected] [2 duplicates]
├─ ☐ [ORIGINAL] Old phone: 555-0001
└─ ☑ Latest: New phone: 555-0002

Action: Keep latest with updated info

Scenario 3: Test Submissions

Email: [email protected] [5 duplicates]
├─ ☐ Test submission 1
├─ ☐ Test submission 2
├─ ☐ Test submission 3
├─ ☐ Test submission 4
└─ ☐ Test submission 5

Action: Remove all test entries

Scenario 4: Partial Duplicates

Name: John Smith [3 responses]
├─ ☑ [email protected]
├─ ☑ [email protected] (different person)
└─ ☐ [email protected] (actual duplicate)

Action: Keep different people, remove true duplicate

Taking Action on Results

Before You Act

Critical Safety Check:

  1. ✅ Review the selections carefully
  2. ✅ Verify backup will be created
  3. ✅ Understand this modifies the spreadsheet
  4. ✅ Know that form responses are preserved

Option 1: Update Linked Spreadsheet

What happens when you click "📊 Update Linked Sheet":

  1. Confirmation dialog appears:

    This will permanently remove 12 rows from your linked spreadsheet.
    A backup will be created before deletion.
    Continue?
    
  2. Backup creation:

    • Creates sheet named "Backup_2024-01-20_1737389456"
    • Preserves all current data
    • Adds metadata about operation
  3. Row deletion:

    • Removes unchecked responses from spreadsheet
    • Maintains all other data integrity
    • Updates row numbers automatically
  4. Completion notification:

    ✅ Successfully removed 12 duplicates!
    Backup: Backup_2024-01-20_1737389456
    

Option 2: Create Clean Sheet

What happens when you click "✨ Create Clean Sheet":

  1. New sheet creation:

    • Sheet named "Clean_2024-01-20_1737389456"
    • Contains only checked responses
    • Original sheet unchanged
  2. Data structure:

    • Same columns as original
    • Headers preserved
    • Formatting maintained
    • Only clean data included
  3. Result:

    ✅ Created clean sheet "Clean_2024-01-20_1737389456"
    with 88 responses
    

Troubleshooting Common Issues

No Duplicates Found

Possible causes:

  1. Your data is already clean (good!)
  2. Wrong fields selected
  3. Detection too strict

Solutions:

  • Try Smart Detect instead
  • Select different fields
  • Enable fuzzy matching
  • Check case sensitivity setting

Too Many False Positives

Cause: Detection too broad

Solutions:

  • Use more specific fields (email vs name)
  • Disable fuzzy matching
  • Enable case sensitivity
  • Select additional fields for combination matching

Scan Takes Too Long

For forms with 10,000+ responses:

  1. First scan: May take 30-45 seconds
  2. Be patient: Don't close the dialog
  3. One-time process: Results are cached
  4. Future scans: Much faster

Results Don't Match Expectations

Debugging steps:

  1. Check field selection:

    • Are you scanning the right fields?
    • Try different field combinations
  2. Review detection options:

    • Is case sensitivity appropriate?
    • Should fuzzy matching be on/off?
  3. Examine sample duplicates:

    • Expand a group
    • Verify they're true duplicates
    • Check the matching values

Best Practices for First Scan

Do's

Start with Smart Detect - Let the AI guide you ✅ Review samples carefully - Verify before removing ✅ Keep the original - Unless you need updates ✅ Check backup created - Verify it exists ✅ Start small - Test with one form first

Don'ts

Don't rush - Take time to review ❌ Don't skip verification - Check the results ❌ Don't ignore patterns - Learn from duplicates ❌ Don't delete backups - Keep for safety ❌ Don't scan during submissions - May miss new data

After Your First Scan

Immediate Next Steps

  1. Verify the cleanup:

    • Open your spreadsheet
    • Check row count decreased
    • Confirm correct data remains
  2. Review the backup:

    • Find "Backup_[timestamp]" sheet
    • Verify it has original data
    • Keep for reference
  3. Document your settings:

    • Note which fields work best
    • Save your detection preferences
    • Plan regular scan schedule

Building a Routine

Recommended scan frequency:

| Form Type | Scan Frequency | Best Time | |-----------|---------------|-----------| | Event registration | After event closes | End of registration | | Ongoing survey | Weekly | Monday morning | | Contact form | Bi-weekly | Every other Friday | | Newsletter signup | Monthly | First of month | | Order forms | Daily | End of business day |

Next Steps

Now that you've completed your first scan: