Regular expressions is a pattern to search and replace in text Syntax Via object const regexp = new RegExp("pattern", "flags") Via slashes const regexp = /pattern/gmi Slashes pattern does not allow for expressions to be inserted, they are fully static Slashes are used when we know the regular expression at the code writing time new RegExp is more often used when we need to create a regexp “on the fly” In both cases regexp becomes an instance of the built-in RegExp class
let tag = prompt("What tag do you want to find?", "h2");
let regexp = new RegExp(`<${tag}>`);
Flags /.../i case-insensitive /.../g all matches /.../m multiline mode /.../s match newline character \n /.../u full Unicode support (correct processing of surrogate pairs) /.../y searching at the exact position in the text
"We will, we will rock you".match(/we/gi) // ["We", "we"]
"A\nB".match(/A.B/) // null
"A\nB".match(/A.B/s) // ['A\nB', index: 0, input: 'A\nB', groups: undefined]
'😄 sfds'.match(/\p{Emoji}/gu) // ["😄"]
Character classes /./ any character, except a newline /\d/ digit /\s/ space, including tabs \t , newlines \n , \v , \f , \r /\w/ word, either a letter of Latin alphabet or a digit or an underscore /\D/ non-digit, any character except \d, for ex a letter /\S/ non-space, any character except \s, for ex a letter /\W/ non-wordy character, anything but \w, e.g a non-latin letter or a space /\s\S/ anything, a space character OR not a space character /\d\D/ anything /[^]/ anything
"Z".match(/./) // Z
"Is there CSS4".match(/CSS\d/g) // ['CSS4'] // matches a string 'CSS' with a digit after it
"+7(903)-123-45-67".match(/\d/g) // ['7']
"+7(903)-123-45-67".match(/\d/g) // ['7', '9', '0', '3', '1', '2', '3', '4', '5', '6', '7']
"+7(903)-123-45-67".match(/\D/g) // ['+', '(', ')', '-', '-', '-']
"+7(903)-123-45-67".replace(/\D/g, "") // 79031234567
"CSS4".match(/CS.4/) // 'CSS4'
"hi 123 свет".match(/[^]/g) // ['h', 'i', ' ', '1', '2', '3', ' ', 'с', 'в', 'е', 'т']
Unicode properties There are 3 categories General category Letter (L), Uppercase_Letter (Lu), Lowercase_Letter (Ll), Titlecase_Letter (Lt), Modifier_Letter (Lm), Other_Letter (Lo) Mark (M), Non-Spacing_Mark (Mn), Spacing_Combining_Mark (Mc), Enclosing_Mark (Me) Number (N), Decimal_Digit_Number (Nd), Letter_Number (Nl), Other_Number (No) Symbol (S), Math_Symbol (Sm), Currency_Symbol (Sc), Modifier_Symbol (Sk), Other_Symbol (So) Punctuation (P), Connector_Punctuation (Pc), Dash_Punctuation (Pd), Open_Punctuation (Ps), Close_Punctuation (Pe), Initial_Punctuation (Pi), Final_Punctuation (Pf), Other_Punctuation (Po) Separator (Z), Space_Separator (Zs), Line_Separator (Zl), Paragraph_Separator (Zp) Other (C), Control (Cc), Format (Cf), Surrogate (Cs), Private_Use (Co), Unassigned (Cn)
"10 >= 5".match(/\p{General_Category=Math_Symbol}/gu) // ['>', '=']
"10 >= 5".match(/\p{Math_Symbol}/gu) // ['>', '=']
"10 >= 5".match(/\p{Sm}/gu) // ['>', '=']
Script
"Привет man".match(/\p{Script=Cyrillic}/gu) // ['П', 'р', 'и', 'в', 'е', 'т']
Binary Unicode property ASCII, ASCII_Hex_Digit, Alphabetic, Any, Dash, Emoji, Hex_Digit, Lowercase, Math, Noncharacter_Code_Point, Pattern_Syntax, Pattern_White_Space, Quotation_Mark, Radical, Regional_Indicator, Sentence_Terminal, Soft_Dotted, Terminal_Punctuation, Unified_Ideograph, Uppercase, White_Space
"1 plus 1 is 2".match(/\p{Alphabetic}/gu) // ['p', 'l', 'u', 's', 'i', 's']
Anchors + word boundary /^/ matches beginning of the text /$/ matches end of the text /\b/ matches for being a word boundary
/^Mary/.test("Mary had a little lamb") // true
/lamb$/.test("Mary had a little lamb") // true
"Hello, Java!".match(/\bJava\b/g) // ['Java']
"Hello, JavaScript!".match(/\bJava\b/g) // null
"1 23 456 78".match(/\b\d\d\b/g) // ["23", "78"]
"12,34,56".match(/\b\d\d\b/g) // ["12", "34", "56"]
// or
"Mary had a little lamb".startsWith("Mary") // true
"Mary had a little lamb".endsWith("lamb") // true
Test if a time format
let regexp = /^\d\d:\d\d$/
regexp.test("12:34") // true
regexp.test("12:345") // false
In multiline text with 'm' flag they match start/ end of a line not only string
let str = `1st place: Winnie
2nd place: Piglet
3rd place: Eeyore
`
str.match(/^\d/gm) // ["1", "2", "3"]
Escaping, special characters [ \ ^ $ . | ? * + ( ) . to be escaped with backslash \
"Chapter 5.1".match(/\d\.\d/g) // ['5.1']
"function g()".match(/g\(\)/g) // ['g()']
// look for backslash
[ab] OR(Sets) Search for any character among given Allow only characters or character classes [tm] "t" or "m" [\w-] wordly character or a hyphen [\s\d] a space or a digit
"Mop top".match(/[tm]op/gi) // ["Mop", "top"]
"Voila".match(/V[oi]la/) // null
| OR Alternation allows any expressions A regexp A|B|C means one of expressions A, B or C I love HTML|CSS matches I love HTML or CSS I love (HTML|CSS) matches I love HTML or I love CSS
"First HTML appeared, then CSS, then JavaScript".match(/html|php|css|java(script)?/gi) // ["HTML", "CSS", "JavaScript"]
"Java, JavaScript, PHP, C, C++".match(/Java(Script)?|C(\+\+)?|PHP/g) // ["Java", "JavaScript", "PHP", "C", "C++"]
"00:00 10:10 23:59 25:99 1:2".match(/([01]\d|2[0-3]):[0-5]\d/g) // ["00:00", "10:10", "23:59"]
Ranges [a-z] range from a to z [0-5] digit from 0 to 5 [\s\d] a space or a digit
// searching for "x" followed by two digits or letters from A to F
"Exception 0xAF".match(/x[0-9A-F][0-9A-F]/g) // ["xAF"]
Exclude [^…] [^aeyo] any character except 'a', 'e', 'y' or 'o' [^0-9] any character except a digit, the same as \D [^\s] any non-space character, same as \S In square brackets we can use the vast majority of special characters w/o escaping, until they mean something for brackets Quantifiers + , * , ? , {n} {3} 3 times {3,5} from 3 to 5 {1,} 1 or more ? optional, same as {0,1} * zero or more, same as {0,} + one or more \d+ looks for numbers
"I'm 12345 years old".match(/\d{5}/) // "12345" // same as \d\d\d\d\d
"I'm not 12, but 1234 years old".match(/\d{3,5}/) // "1234"
"I'm not 12, but 345678 years old".match(/d{3,}/) // "345678"
"+7(903)-123-45-67".match(/\d+/g) // ["7", "903", "123", "45", "67"]
"color or colour?".match(/colou?r/g) // ["color", "colour"]
"100 10 1".match(/\d0*/g) // ["100", "10", "1"] // looks for a digit followed by any number of zeroes (may be many or none)
"100 10 1".match(/\d0+/g) // ["100", "10"] // 1 not matched, as 0+ requires at least one zero
"0 1 12.345 7890".match(/\d+\.\d+/g) // 12.345 // Regexp for decimal fractions
"<body> ... </body>".match(/<[a-z]+>/gi) // <body> // Regexp for an “opening HTML-tag w/o attributes”, such as <span> or <p>
"Hello!... How goes?.....".match(/\.{3,}/g) // ["...", "....."] // find an ellipsis "..."
Greedy and lazy quantifiers Greedy mode
// let's find ["witch", "broom"]
'a "witch" and her "broom" is one'.match(/".+"/g) // ['"witch" and her "broom"']
// not what we want
. means any character, + means one or more times .+ stops at a new line or end, then search for " further But there is no further, because we stopped at the end Regular expression engine understands that it took too many and starts to backtrack It iterates the string and shortens the match for the quantifier by one character every attempt from the end We got "witch" and her "broom" Due to flag 'g' it will continue from the prev match end, but no more quotes in the rest of the string In the default 'greedy' mode a quantified character is repeated as many times as possible The regexp adds to the match as many characters as it can for .+ and then shortens that one by one, if the rest of the pattern doesn’t match Greedy quantifier may lead to catastrophic backtracking and make regexp to execute very long Lazy mode
'a "witch" and her "broom" is one'.match(/".+?"/g) // ["witch", "broom"]
Enable lazy mode by putting a question mark ? Repeats minimal number of times Usually ? is a quantifier (zero or one) But if added after another quantifier it gets another meaning It switches the matching mode from greedy to lazy Laziness is only enabled for the quantifier with ? Other quantifiers remain greedy Capturing groups (...) "Gogogo now!"".match(/(go)+/ig) // "Gogogo" (go)+ means 'go', 'gogo', 'gogogo' and so on Search engine memorizes the content matched by parentheses Parentheses are numbered from left to right Can be retrieved from the array The zero index of result always holds the full match.
'<h1>Hello, world!</h1>'.match(/<.*?>/)
// ['<h1>', index: 0, input: '<h1>Hello, world!</h1>', groups: undefined]
// with ()
'<h1>Hello, world!</h1>'.match(/<(.*?)>/)
// ['<h1>', 'h1', index: 0, input: '<h1>Hello, world!</h1>', groups: undefined]
To include contents inside parentheses into the result wrap it into additional braces
"1 turkey costs 30€".match(/\d+(?=(€|kr))/) // 30, €
Capturing groups & str.replace()
"John Bull".replace(/(\w+) (\w+)/, '$2, $1') // Bull, John
Named groups Remembering groups by their numbers is hard we can give names to parentheses That’s done by putting ?<name> immediately after the opening brace
let dateRegexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/
let str = "2019-04-30"
let groups = str.match(dateRegexp).groups
groups.year // 2019
groups.month // 04
groups.day // 30
Exclude group with ?:
let str = "Gogogo John!"
let regexp = /(?:go)+ (\w+)/i // ?: excludes 'go' from capturing
let result = str.match(regexp)
result[0] // Gogogo John (full match)
result[1] // John
result.length // 2 (no more items in the array)
Named groups & str.replace()
let regexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/g
let str = "2019-10-30, 2020-01-01"
str.replace(regexp, '$<day>.$<month>.$<year>') // 30.10.2019, 01.01.2020
Backreference \1 We can refer to capturing group Engine finds the first quote () and memorizes its content Further we can “find the same as in the first group” by \1 \2 would mean the contents of the second group If we use ?: in the group, then we can’t reference it
`He said: "She's the one!".`.match(/(['"])(.*?)\1/g) // ["She's the one!"]!
Named groups can be used \k<name>
`He said: "She's the one!".`.match(/(?<quote>['"])(.*?)\k<quote>/g) // ["She's the one!"]
Lookahead (?= smth) X(?=Y) means look for "X", if "Y" is after it Contents of the parentheses (?=...) is not included in the result
"1 turkey costs 30€".match(/\d+(?=€)/) // "30"
// looks for a digit that is followed by a space and if there’s 30 somewhere after it
"1 turkey costs 30€".match(/\d+(?=\s)(?=.*30)/) // '1'
Negative lookahead (?!= smth) X(?!Y) means "search X, but only if not followed by Y"
"2 turkeys cost 60€".match(/\d+\b(?!€)/g) // "2"
Lookbehind (?<=Y)X (?<=Y)X matches X only if there’s Y before it
"1 turkey costs $30".match(/(?<=\$)\d+/) // '30'
Negative lookbehind (?<!Y)X (?<!Y)X matches X if there’s no Y before it
"2 turkeys cost $60".match(/(?<!\$)\b\d+/g) // ["2"]
// Find non-negative integers
"0 12 -5 123 -18".match(/(?<!-)\d+/g) // ["0", "12", "123", "8"]
Search at position with 'y' flag /.../y Flag 'y' allows to perform the search at the given position in the source string regexp.exec(str) works like str.match(regexp) With flag g it performs the search in str, starting from position stored in the regexp.lastIndex property If there is a match, then it sets regexp.lastIndex to the index Successive calls to regexp.exec(str) return matches one after another
let str = 'let varName' // Let's find all words in this string
let regexp = /\w+/g
regexp.lastIndex // 0 (initially lastIndex=0)
let word1 = regexp.exec(str)
word1[0] // let (1st word)
regexp.lastIndex // 3 (position after the match)
let word2 = regexp.exec(str)
word2[0] // varName (2nd word)
regexp.lastIndex // 11 (position after the match)
let word3 = regexp.exec(str)
word3 // null (no more matches)
regexp.lastIndex // 0 (resets at search end)
// we can get all matches in the loop:
str = 'let varName'
regexp = /\w+/g
let result
while (result = regexp.exec(str)) {
console.log( `Found ${result[0]} at position ${result.index}` )
// Found let at position 0
// Found varName at position 4
}
'y' flag makes regexp.exec() to search exactly at position lastIndex
let str = 'let varName = "value"'
let regexp = /\w+/y
regexp.lastIndex = 3
regexp.exec(str) // null (there's a space at position 3, not a word)
regexp.lastIndex = 4
regexp.exec(str) // varName (word at position 4)
RegExp methods str.match(regexp) - finds all matches of 'regexp' in the string 'str', with 'g' flag returns an array str.matchAll(regexp) - returns not an array, but an iterable object str.split(regexp, limit) - Splits the string using the regexp (or a substring) as a delimiter str.search(regexp) - returns the position of the first match or -1 if none found: str.replace(regexp, replacement) - replaces matches found using regexp in string str with replacement str.replaceAll(regexp, replacement) - same as str.replace With 'g' flag regexp.exec(str) - works exactly like str.match(regexp) regexp.test(str) - looks for at least one match, if found, returns true, otherwise false str.match(regexp) Finds matches in a string If doesn’t have flag g array with capturing groups Wth g flag returns an array of all matches as strings If there are no matches null is returned To ensure a result to be an array put let result = str.match(regexp) || []
// without flag g
let result = "I love JavaScript".match(/Java(Script)/)
// (2) ['JavaScript', 'Script', index: 7, input: 'I love JavaScript', groups: undefined]
result[0] // JavaScript (full match)
result[1] // Script (first capturing group)
result.length // 2
result.index // 7 (match position)
result.input // I love JavaScript (source string)
// with flag g
result = "I love JavaScript".match(/Java(Script)/g)
result[0] // JavaScript
result.length // 1
// no matches
result = "I love JavaScript".match(/HTML/)
result // null
// more examples
let str = "We will, we will rock you"
str.match(/we/gi) // ["We", "we"]
str.match(/we/i) // ["We", index: 0, input: "We will, we will rock you", groups: undefined]
str.match(/hello/i) // null
let matches = "JavaScript".match(/HTML/) || []
if (!matches.length) alert("No matches")
str.matchAll(regexp) Search for all matches with all groups Returns an iterable object with matches If there are no results, it returns an empty iterable object Every match is returned as an array with capturing groups (the same str.match w/o flag g) Can use for..of to loop over matchAll matches
let matchAll = '<h1>Hello, world!</h1>'.matchAll(/<(.*?)>/g)
matchAll // [object RegExp String Iterator], not array, but an iterable
matchAll = Array.from(matchAll)
let firstMatch = matchAll[0]
firstMatch[0] // <h1>
firstMatch[1] // h1
firstMatch.index // 0
firstMatch.input // <h1>Hello, world!</h1>
str.split(regexp, limit) Splits the string using the regexp (or a substring) as a delimiter
'12-34-56'.split('-') // ['12', '34', '56']
'12, 34, 56'.split(/,s*/) // ['12', '34', '56']
str.search(regexp) Returns the position of the first match Returns -1 if none are found Search until the first match If we need positions of all matches, use str.matchAll(regexp)
"A drop of ink may make a million think".search(/ink/i) // 10
str.replace(str | regexp, str | func) Method for searching and replacing When the first argument is a string, it replaces the first match only
'12-34-56'.replace("-", ":") // 12:34-56
'12-34-56'.replace( /-/g, ":" ) // 12:34:56
"We will, we will".replace(/we/i, "I") // I will, we will // no flag g
"We will, we will".replace(/we/ig, "I") // I will, I will // with flag g
Accepts special characters at the replacement string argument (2nd) $& inserts the whole match $` inserts a part of the string before the match $' inserts a part of the string after the match $n if n is a 1-2 digit number, inserts the contents of n-th capturing group $<name> inserts the contents of the parentheses with the given name $$ inserts character $
// swap first and last name
"John Smith".replace(/(john) (smith)/i, '$2, $1') // Smith, John
Second argument can be a function for smart replacement It will be called for each match The returned value will be inserted as a replacement replacementFunc(str, offset, input) Read more about function at the original source
// let’s uppercase all matches
let str = "html and css";
let result = str.replace(/html|css/gi, str => str.toUpperCase())
result // HTML and CSS
// Replace each match by its position in the string
"Ho-Ho-ho".replace(/ho/gi, (match, offset) => offset) // 0-3-6
str.replaceAll(str | regexp, str | func) Same as str.replace() , with two major differences If the first argument is a string, it replaces all occurrences If the first argument is a regular expression w/o the 'g' flag, there’ll be an error With 'g' flag, it works the same as str.replace() The main use case for is replacing all occurrences of a string
'12-34-56'.replaceAll("-", ":") // 12:34:56
regexp.exec(str) Returns a match for regexp in the string It’s called on a regexp, not on a string Behaves differently depending on whether the regexp has flag 'g' If no 'g', then returns the first match, same as str.match(regexp) If there’s flag g, then returns the first match and saves its position in regexp.lastIndex property Next call starts the search from position regexp.lastIndex , returns the next match and saves the position after it in regexp.lastIndex And so on... If there are no matches, regexp.exec() returns null and resets regexp.lastIndex to 0 We can use regexp.exec to search from a given position by manually setting lastIndex If the regexp has flag 'y', the search will be performed exactly at the position regexp.lastIndex . That’s convenient when need to “read” from the string by a regexp at the exact position.
let str = 'More about JavaScript at https://javascript.info'
let regexp = /javascript/ig
let result
while (result = regexp.exec(str)) {
alert( `Found ${result[0]} at position ${result.index}` )
// Found JavaScript at position 11, then
// Found javascript at position 33
}
Search from a given position
let str = 'Hello, world!'
let regexp = /\w+/g // w/o flag "g", lastIndex property is ignored
regexp.lastIndex = 5 // search from 5th position (from the comma)
regexp.exec(str) // world
// or
let str = 'Hello, world!'
let regexp = /\w+/y
regexp.lastIndex = 5 // search exactly at position 5
regexp.exec(str) // null
regexp.test(str) Looks for a match and returns true/false whether it exists If the regexp has flag 'g', then looks from regexp.lastIndex property and updates this property, just like regexp.exec() If we apply the same global regexp to different inputs, it may lead to wrong result, recommended to set regexp.lastIndex = 0 before each search
let str = "I love JavaScript"
/love/i.test(str) // true
// same as
str.search(/love/i) != -1 // true
Search from a given position
let regexp = /love/gi
let str = "I love JavaScript"
// start the search from position 10
regexp.lastIndex = 10
regexp.test(str) // false (no match)
Useful Html and text between tags
const bodyPattern = /<body[^>]*>((.|[\n\r])*)<\/body>/im
const textContentPattern = /<[^>]*(>|$)| |"|'|‌|»|«|>/g
const html = '<html><head><title>NewTab</title></head><body><span>Hi</span><span>Bye</span></body></html>'
const body = html.match(bodyPattern)[0] // <span>Hi</span><span>Bye</span>
const text = body.replace(textContentPattern, '') // HiBye
Text between 2 strings
(?<=beginsWith)(.*)(?=endsWith)
levis can be selected from Television by (?<=Te)(.*)(?=ion) Text between 2 strings including them (greedy mode)
beginsWith(.*)endsWith
Television can be selected from Television by Te(.*)on Text between 2 strings including them (lazy mode)
beginsWith(.*?)endsWith
includes substring and excludes another substring
^(?=.*(includeSubString1|includeSubString2))(?!.*excludeSubString1)(?!.*excludeSubString2).*