What Is a Regular Expression?
A regular expression (regex) is a sequence of characters that defines a search pattern. You use regex to find, extract, or replace text that matches a specific structure โ without knowing the exact text in advance.
For example, the pattern \d{4}-\d{2}-\d{2} matches any string that looks like a date in YYYY-MM-DD format. It does not care whether the date is 2024-01-15 or 1999-12-31 โ both match because both fit the structure.
Regex is built into nearly every programming language: JavaScript, Python, Java, Go, Ruby, PHP. It powers search in VS Code, grep in the terminal, form validation on websites, and log parsing in production systems.
Your First Regex: Literal Characters
The simplest regex is just a literal string. The pattern cat matches any text containing the letters c, a, t in sequence:
Text: "The cat sat on the mat."
Pattern: cat
Match: "cat" (position 4)
Case sensitivity matters. cat does not match Cat unless you enable the case-insensitive flag (/cat/i in JavaScript).
Special Characters (Metacharacters)
Certain characters have special meaning in regex. They are called metacharacters:
| Character | Meaning |
|---|---|
. |
Any single character except newline |
^ |
Start of string (or line in multiline mode) |
$ |
End of string (or line) |
* |
Zero or more of the preceding element |
+ |
One or more of the preceding element |
? |
Zero or one of the preceding element |
\ |
Escape the next character |
[] |
Character class โ match any one character inside |
() |
Capturing group |
| ` | ` |
{} |
Quantifier with exact count |
To match a literal ., escape it: \.
Character Classes
A character class [...] matches any single character from the set:
[aeiou] โ any vowel
[a-z] โ any lowercase letter
[A-Z] โ any uppercase letter
[0-9] โ any digit (same as \d)
[a-zA-Z0-9] โ any letter or digit
[^aeiou] โ any character that is NOT a vowel (^ negates inside [])
Shorthand Classes
These are so common they have their own shortcuts:
| Shorthand | Equivalent | Matches |
|---|---|---|
\d |
[0-9] |
Any digit |
\D |
[^0-9] |
Any non-digit |
\w |
[a-zA-Z0-9_] |
Word character |
\W |
[^a-zA-Z0-9_] |
Non-word character |
\s |
[ \t\r\n\f] |
Whitespace |
\S |
[^ \t\r\n\f] |
Non-whitespace |
Quantifiers
Quantifiers control how many times the preceding element must appear:
\d+ โ one or more digits โ matches "42", "0", "1000"
\d* โ zero or more digits โ matches "", "5", "999"
\d? โ zero or one digit โ matches "" or "7"
\d{3} โ exactly 3 digits โ matches "123" but not "12"
\d{2,4} โ between 2 and 4 digits โ matches "12", "123", "1234"
\d{3,} โ 3 or more digits
Greedy vs. Lazy
By default quantifiers are greedy โ they match as much as possible:
Pattern: <.+>
Text: <b>bold</b>
Match: <b>bold</b> (the whole thing)
Adding ? makes it lazy โ match as little as possible:
Pattern: <.+?>
Text: <b>bold</b>
Matches: <b> and </b> (separately)
Anchors
Anchors match a position rather than a character:
^Hello โ string must START with "Hello"
world$ โ string must END with "world"
^\d{5}$ โ string must be exactly 5 digits (US zip code)
\bcat\b โ "cat" as a whole word (\b is a word boundary)
Word boundaries are particularly useful. \bcat\b matches "cat" in "the cat sat" but not in "concatenate".
Capturing Groups
Parentheses () create a capturing group, which lets you extract parts of a match:
(\d{4})-(\d{2})-(\d{2})
Applied to "2024-06-15":
- Group 0 (full match):
2024-06-15 - Group 1:
2024 - Group 2:
06 - Group 3:
15
In JavaScript you access groups via the exec() result or named groups:
const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = pattern.exec("2024-06-15");
console.log(match.groups.year); // "2024"
console.log(match.groups.month); // "06"
Non-capturing groups (?:...) group without capturing โ useful when you need grouping for quantifiers but do not need the captured value:
(?:https?://)?example\.com
Alternation
The pipe | means OR:
cat|dog โ matches "cat" or "dog"
(jpg|jpeg|png) โ matches any image extension
Real-World Patterns
Email Address (simplified)
^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$
This matches most valid emails. A truly RFC-compliant email regex is famously complex โ for production use, validate with a library.
US Phone Number
^\+?1?\s?[\(\-]?\d{3}[\)\-\s]?\d{3}[\-\s]?\d{4}$
Matches: (555) 123-4567, 555-123-4567, +1 555 123 4567
URL
https?://[^\s/$.?#].[^\s]*
IPv4 Address
\b(?:\d{1,3}\.){3}\d{1,3}\b
Hex Color Code
#[0-9a-fA-F]{6}|#[0-9a-fA-F]{3}
Matches #FF5733 and #F53.
Strong Password (at least 8 chars, upper, lower, digit, special)
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
This uses lookaheads (?=...) โ zero-width assertions that check a condition without consuming characters.
Flags / Modifiers
Most regex engines support flags that change matching behavior:
| Flag | Meaning |
|---|---|
i |
Case insensitive |
g |
Global โ find all matches, not just the first |
m |
Multiline โ ^ and $ match each line |
s |
Dotall โ . matches newlines too |
x |
Extended โ ignore whitespace and allow comments |
In JavaScript: /pattern/flags โ e.g. /\d+/gi
In Python: re.compile(r'\d+', re.IGNORECASE | re.MULTILINE)
Testing and Debugging Regex
Regex is notoriously hard to read back after writing it. Always test with real data including edge cases. Use the Regex Tester to paste your pattern and test strings side by side โ it highlights matches in real time and shows capturing groups.
Common debugging steps:
- Start with a simple literal match to confirm the tool works.
- Add one metacharacter at a time.
- Test both matching and non-matching cases.
- Check edge cases: empty strings, very long inputs, special characters.
Frequently Asked Questions
Why does my regex match too much?
Quantifiers are greedy by default. Add ? after * or + to make them lazy, or use more specific character classes instead of ..
What is the difference between ^ inside and outside []?
Outside [], ^ anchors to the start of a string. Inside [^abc], it negates the class โ matching any character that is not a, b, or c.
Is regex the same in every language? No. Most support the same core syntax but differ in advanced features: named groups, lookaheads, Unicode properties, and flags vary. JavaScript, Python, and PCRE (PHP, Perl) are similar. POSIX regex (used in older Unix tools) lacks some features.
When should I not use regex? For structured formats like JSON, XML, or HTML, use a proper parser instead. Regex on HTML is a notorious anti-pattern that breaks on nested tags and edge cases.
Try your patterns live in the Regex Tester โ paste a pattern, type test strings, and see matches highlighted instantly.