Using if and else in regular expression

Ask Time：2013-05-20T14:28:00 Author：jsendo

I'm having difficulty trying to understand this particular regular expression (it is currently used to check user input for phone number) :

^((\+\d{1,3}(-| )?\(?\d\)?(-| )?\d{1,3})|(\(?\d{2,3}\)?))(-| )?(\d{1,4})(-| )?(\d{6})(( x| ext)\d{1,5}){0,1}$

I read that "?()" is used for if condition in regular expression, but it still not really clear for me the logic behind this regular expression and what kind of input is accepted and rejected by it.

Thanks

Author:jsendo，eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/16643766/using-if-and-else-in-regular-expression

Jimbo :

Firstly, in regexp, ?() is not a conditional. ? matches the character (group) to the left of it 0 or 1 times and () starts a capture group with nothing in it?... no conditionals I'm afraid :) The closest might be (a|b) which matches either a or b...\n\nThe regexp is a little difficult to read, so\n\n^((\\+\\d{1,3}(-| )?\$?\\d\$?(-| )?\\d{1,3})|(\$?\\d{2,3}\$?))(-| )?(\\d{1,4})(-| )?(\\d{6})(( x| ext)\\d{1,5}){0,1}$\n\n\nTry regexper.com, type in the regexp and it will draw you a state diagram...\n\nUsing some tabbing to break up the expression:\n\n^(\n (\\+\\d{1,3}(-| )?\n \$?\\d\$?(-| )?\n \\d{1,3})\n |(\n \$?\\d{2,3}\$?\n )\n)\n\n(-| )?(\\d{1,4})\n(-| )?(\\d{6})\n(\n ( x| ext)\\d{1,5}\n){0,1}$\n\n\n(Note makes some spaces hard to read but we'll go through that by referencing the original)\n\n^ matches the start of a line\n\nThe next group is ((\\+\\d{1,3}(-| )?\$?\\d\$?(-| )?\\d{1,3})|(\$?\\d{2,3}\$?))\n\nThis has two parts: (X|Y), where X=(\\+\\d{1,3}(-| )?\$?\\d\$?(-| )?\\d{1,3}) and Y=(\$?\\d{2,3}\$?). This will match either X or Y...\n\nBreaking down X=(\\+\\d{1,3}(-| )?\$?\\d\$?(-| )?\\d{1,3}):\n\n\nThe outer () are a capture, so strip these...\n\\+ matches a literal plus sign. Note that it has to be escaped with the \\ because + is a meta character meaning \"match one or more of the previous\".\n\\d{1,3} matches any decimal digit eiter 1, 2 or 3 times but no more or less\n(-| )? matches either - or (space) zero or one times. The ? is wht specifies zero or one times.\n\$?\\d\$ matches a literal '(' (notice the escape) zero or one times. Then a decimal digit, then another literal )\n(-| )? we've seen before (matches either - or (space) zero or one times. The ? is wht specifies zero or one times.)\n\\d{1,3} we've also seen before (matches any decimal digit eiter 1, 2 or 3 times but no more or less)\n\n\nSo we can say that X matches (and captures - that's wat the outer () is doing) any string that starts with a plus, has 1 to 3 digits then possibly a space or a hyphen, a digit inside brackets, possibly another space or hyphen and then another 1 to 3 digits. This is captured as the first capture group... phew!\n\nBreaking down Y=(\$?\\d{2,3}\$?):\n\n\nThe outer () are a capture so string these...\n\$? matches a literal ( zero or one times. \n\\d{2,3} matches any digit two or three times\n\$? matches a literal ) zero or one times\n\n\nSo we can say that Y matches an two or three digit number, possibly surrounded by brackets. This is captures as the first capture group. Jeez!\n\nNow we have X and Y we can see what the first chunk of the regexp matches (brain melting!).\n\nThe first chunk, call it CHUNK1 matches and captures either\n\n\nany string that starts with a plus, has 1 to 3 digits then possibly a space or a hyphen, a digit inside brackets, possibly another space or hyphen and then another 1 to 3 digits OR\nany two or three digit number, possibly surrounded by brackets\n\n\nContinuing...\n(-| )? we've seen before (matches either - or (space) zero or one times. The ? is wht specifies zero or one times.)\n\n(\\d{1,4}) matches a string of digit characters that is 1,2,3 or 4 digits in length. This forms the second capture group.\n\n(-| )? we've seen before (matches either - or (space) zero or one times. The ? is wht specifies zero or one times.)\n\n(\\d{6}) matches a string of exactly 6 digits\n\nSo here you are matching a string with a possible space or hypen, 1 to 4 numbers, another possible space or hyphen and then 6 numbers. Call this chunk2\n\nSo far we have matched any string consistiing of chunk1 followed immediately by chunk2...\n\nThis concludes the main bit of the phone number, the rest appears to handle extensions...\n\nThe next bit is (( x| ext)\\d{1,5}){0,1}. Lets break this down a little.\n\n\nThe surrounding brackets are the capture group.\n( x| ext) matches either of the two literal strings ' x' or ' ext' - note the beginning space.\n\\d{1,5} matches any digit 1,2,3,4 or 5 times.\n{0,1} matches the capture group zero or one times... i.e. the phone number does not need to have an extension\n\n\nFinally $ matches the end of line. \n\nHopefully this has broken down the string well enough for you to work through :)",

2013-05-20T07:00:33

Using if and else in regular expression