// File: src/content/docs/remove-duplicates-forms/getting-started/first-scan.md // Location: src/content/docs/remove-duplicates-forms/getting-started/ // Purpose: Detailed guide for running the first duplicate scan
title: "Your First Duplicate Scan" description: "Learn how to run your first duplicate scan and understand the results in Remove Duplicates for Google Forms" order: 3 lastUpdated: "2025-01-20" readingTime: 7 keywords: ["first scan", "duplicate detection", "google forms", "scan results", "data cleaning"]
Your First Duplicate Scan
This comprehensive guide walks you through running your first duplicate scan, understanding the results, and taking action to clean your form data.
Understanding What Happens During a Scan
The Scanning Process
When you initiate a scan, Remove Duplicates:
- Reads your form structure - Analyzes all form fields and their types
- Loads response data - Retrieves all form responses from the linked spreadsheet
- Applies detection logic - Compares responses based on selected fields
- Groups duplicates - Organizes duplicate responses by matching values
- Displays results - Shows actual duplicate values with full context
What Makes a Response "Duplicate"?
A duplicate is detected when two or more responses have:
- Identical values in the selected detection fields
- Similar values when fuzzy matching is enabled (85% similarity)
- Same normalized data (e.g., "[email protected]" = "[email protected]")
Preparing for Your First Scan
1. Check Your Form Status
Before scanning, verify:
✅ Form has responses: At least 2 responses needed
✅ Spreadsheet is linked: Required for cleaning operations
✅ Add-on is open: Extensions → Remove Duplicates → Find & Delete
2. Understand Your Data
Consider these questions:
- What type of duplicates am I expecting? (repeat submissions, test entries, etc.)
- Which fields uniquely identify a respondent? (email, ID number, name)
- Do I want the first or latest submission from duplicates?
3. Response Volume Guidelines
| Responses | Scan Time | Performance | |-----------|-----------|-------------| | 1-100 | Instant | Immediate results | | 100-1,000 | 2-3 seconds | Fast processing | | 1,000-5,000 | 5-10 seconds | Normal processing | | 5,000-10,000 | 15-20 seconds | Batch processing | | 10,000+ | 20-45 seconds | Optimized batching |
Method 1: Smart Auto-Detection (Recommended)
When to Use Smart Detection
Smart detection is ideal for:
- ✅ Standard contact forms
- ✅ Event registrations
- ✅ Newsletter signups
- ✅ Survey responses
- ✅ First-time users
Step-by-Step Smart Scan
-
Open the add-on
Extensions → 🧹 Remove Duplicates in Forms → 👉 Find & Delete duplicates
-
Stay on Smart Detect tab (default)
- You'll see a purple icon and description
- No configuration needed
-
Click "✨ Auto-Detect Duplicates"
- Button changes to "⏳ Detecting..."
- Progress shown in loading state
-
Wait for analysis (2-5 seconds)
- Add-on analyzes your form structure
- Identifies duplicate-prone fields
- Runs the detection algorithm
-
Review detected fields
- After scan, you'll see which fields were checked
- Common auto-detected fields:
- Email addresses
- Phone numbers
- Names (first, last, full)
- ID numbers
- Student/Employee IDs
Understanding Smart Detection Logic
The add-on prioritizes fields in this order:
- Priority 1: Email fields - Most reliable unique identifier
- Priority 2: ID fields - Student ID, Employee ID, etc.
- Priority 3: Name fields - Full name, First + Last name
- Priority 4: Phone fields - Mobile, telephone numbers
- Priority 5: Required fields - Any required text field
Maximum fields selected: 3 (to balance accuracy and performance)
Method 2: Manual Field Selection
When to Use Manual Selection
Manual selection is better for:
- ✅ Custom business logic
- ✅ Specific field combinations
- ✅ Non-standard forms
- ✅ Partial duplicate checking
- ✅ Advanced users
Step-by-Step Manual Scan
-
Switch to Manual Select tab
Click "🎯 Manual Select" tab
-
Open configuration drawer
Click "🎯 Select Fields & Options"
-
Choose your fields (in the 420px drawer)
Field Selection Interface:
📍 SELECT FIELDS TO CHECK ☐ Email Address [REC] ☐ Full Name [REC] ☐ Phone Number ☐ Student ID ☐ Custom Field 1
- Check boxes next to fields to scan
- [REC] badge = Recommended field
- Selected fields get blue background
-
Configure detection options
⚙️ DETECTION OPTIONS:
-
☐ Case sensitive matching
- When checked: "[email protected]" ≠ "[email protected]"
- When unchecked: Both are treated as duplicates
-
☐ Fuzzy matching
- When checked: "Jon Smith" ≈ "John Smith" (similar)
- When unchecked: Only exact matches count
-
-
Choose keep strategy
📍 KEEP STRATEGY:
-
◉ Keep first response (original)
- Preserves the earliest submission
- Best for: Preventing repeat submissions
-
○ Keep latest response (most recent)
- Preserves the newest submission
- Best for: Getting updated information
-
-
Click "🔍 Find Duplicates"
- Drawer closes automatically
- Scan begins with your configuration
Understanding Scan Results
Results Overview
After scanning completes, you'll see:
Results Summary Bar:
[12 to remove] [88 to keep] [✅ Ready]
This means:
- 12 duplicate responses will be removed
- 88 unique responses will be kept
- Ready for you to take action
Duplicate Groups Structure
Each duplicate group shows:
┌─────────────────────────────────────────┐
│ Email: [email protected] [3 duplicates] │ ← Click to expand
└─────────────────────────────────────────┘
When expanded:
☑ [email protected]
[ORIGINAL] 2024-01-15 10:30 AM
Name: John Smith
Phone: 555-0123
Message: Original submission
☐ [email protected]
2024-01-16 2:15 PM
Name: John Smith
Phone: 555-0123
Message: Duplicate submission
☐ [email protected]
2024-01-17 9:45 AM
Name: John Smith
Phone: 555-0124
Message: Another duplicate
Reading Response Details
Each response card shows:
-
Checkbox state
- ☑ Checked = Will be KEPT
- ☐ Unchecked = Will be REMOVED
-
Primary identifier
- The actual value that matched (e.g., email address)
-
Timestamp
- Exact submission date and time
-
[ORIGINAL] badge
- Green background
- Indicates first occurrence
- Usually checked by default with "Keep First" strategy
-
Full response data
- All form fields and their answers
- Helps you verify it's truly a duplicate
Customizing Selections
You can manually adjust which responses to keep:
- Click any checkbox to toggle keep/remove
- Check multiple responses to keep several versions
- Uncheck all to remove entire group
- Mixed selection for custom logic
Real-time updates: The summary bar updates as you change selections
Interpreting Different Scenarios
Scenario 1: Perfect Duplicates
Email: [email protected] [2 duplicates]
├─ ☑ [ORIGINAL] All fields identical
└─ ☐ Exact copy of original
Action: Keep original, remove duplicate
Scenario 2: Updated Information
Email: [email protected] [2 duplicates]
├─ ☐ [ORIGINAL] Old phone: 555-0001
└─ ☑ Latest: New phone: 555-0002
Action: Keep latest with updated info
Scenario 3: Test Submissions
Email: [email protected] [5 duplicates]
├─ ☐ Test submission 1
├─ ☐ Test submission 2
├─ ☐ Test submission 3
├─ ☐ Test submission 4
└─ ☐ Test submission 5
Action: Remove all test entries
Scenario 4: Partial Duplicates
Name: John Smith [3 responses]
├─ ☑ [email protected]
├─ ☑ [email protected] (different person)
└─ ☐ [email protected] (actual duplicate)
Action: Keep different people, remove true duplicate
Taking Action on Results
Before You Act
Critical Safety Check:
- ✅ Review the selections carefully
- ✅ Verify backup will be created
- ✅ Understand this modifies the spreadsheet
- ✅ Know that form responses are preserved
Option 1: Update Linked Spreadsheet
What happens when you click "📊 Update Linked Sheet":
-
Confirmation dialog appears:
This will permanently remove 12 rows from your linked spreadsheet. A backup will be created before deletion. Continue?
-
Backup creation:
- Creates sheet named "Backup_2024-01-20_1737389456"
- Preserves all current data
- Adds metadata about operation
-
Row deletion:
- Removes unchecked responses from spreadsheet
- Maintains all other data integrity
- Updates row numbers automatically
-
Completion notification:
✅ Successfully removed 12 duplicates! Backup: Backup_2024-01-20_1737389456
Option 2: Create Clean Sheet
What happens when you click "✨ Create Clean Sheet":
-
New sheet creation:
- Sheet named "Clean_2024-01-20_1737389456"
- Contains only checked responses
- Original sheet unchanged
-
Data structure:
- Same columns as original
- Headers preserved
- Formatting maintained
- Only clean data included
-
Result:
✅ Created clean sheet "Clean_2024-01-20_1737389456" with 88 responses
Troubleshooting Common Issues
No Duplicates Found
Possible causes:
- Your data is already clean (good!)
- Wrong fields selected
- Detection too strict
Solutions:
- Try Smart Detect instead
- Select different fields
- Enable fuzzy matching
- Check case sensitivity setting
Too Many False Positives
Cause: Detection too broad
Solutions:
- Use more specific fields (email vs name)
- Disable fuzzy matching
- Enable case sensitivity
- Select additional fields for combination matching
Scan Takes Too Long
For forms with 10,000+ responses:
- First scan: May take 30-45 seconds
- Be patient: Don't close the dialog
- One-time process: Results are cached
- Future scans: Much faster
Results Don't Match Expectations
Debugging steps:
-
Check field selection:
- Are you scanning the right fields?
- Try different field combinations
-
Review detection options:
- Is case sensitivity appropriate?
- Should fuzzy matching be on/off?
-
Examine sample duplicates:
- Expand a group
- Verify they're true duplicates
- Check the matching values
Best Practices for First Scan
Do's
✅ Start with Smart Detect - Let the AI guide you ✅ Review samples carefully - Verify before removing ✅ Keep the original - Unless you need updates ✅ Check backup created - Verify it exists ✅ Start small - Test with one form first
Don'ts
❌ Don't rush - Take time to review ❌ Don't skip verification - Check the results ❌ Don't ignore patterns - Learn from duplicates ❌ Don't delete backups - Keep for safety ❌ Don't scan during submissions - May miss new data
After Your First Scan
Immediate Next Steps
-
Verify the cleanup:
- Open your spreadsheet
- Check row count decreased
- Confirm correct data remains
-
Review the backup:
- Find "Backup_[timestamp]" sheet
- Verify it has original data
- Keep for reference
-
Document your settings:
- Note which fields work best
- Save your detection preferences
- Plan regular scan schedule
Building a Routine
Recommended scan frequency:
| Form Type | Scan Frequency | Best Time | |-----------|---------------|-----------| | Event registration | After event closes | End of registration | | Ongoing survey | Weekly | Monday morning | | Contact form | Bi-weekly | Every other Friday | | Newsletter signup | Monthly | First of month | | Order forms | Daily | End of business day |
Next Steps
Now that you've completed your first scan:
- Understanding Duplicates - Learn why duplicates occur
- Smart Detection - Deep dive into auto-detection
- Duplicate Strategies - Master Keep First vs Keep Latest
- Backup System - Understanding your safety net