Posted: 11/2/2001 12:34:34 PM EDT
|
I've been writing code for quite a while but never got around to learning regular expressions, until today. Had to write some code to validate a domain name as being properly formatted, and figured I'd learn how to do it with reg exp. Here's my pattern: [b][a-zA-Z0-9]\.com|net|org$[/b] Which I understand as "match until you don't find a letter or number, then you must find a dot/period then com, net or org". Problem is that I can pass it stuff like a&a.com and it matches, and I can't figure it out. Any help would be, well, helpfull. [:)] |
|
Hi DVDTracker, I assume you are talking PERL regexps? There is slightly different syntax for different languages (various UNIX shells, xemacs, C language, etc). The problem is anything in brackets means you are only matching one letter by default. So your expression matches: a&a.com because it is finding a match with the underlined portion: a&[u]a.com[/u] If you are trying to match any alphanumeric for any number of characters, followed by the .net, .com, or .org domains, your expression should look like this: [b]^[a-zA-Z0-9]+\.com|net|org$[/b] I'm assuming the rest of your syntax is right. Offhand I forget how the logical or operator works (the |). The changes are the ^ anchor. This anchors your regexp to the beginning of the word. That was why your previous regexp was not excluding the "a&" portion in your example -- you didn't tell it that it couldn't have anything in front of the string you were matching. The other change is the "+" modifier. As I said earlier, anything in brackets denotes just one character. So the + modifier means that you are now matching for one or more characters that meet the criterion of [a-zA-Z0-9]. If it was okay to have zero or more characters, you could use the * modifier instead of the +, but I don't think that is what you want in your example. Hope that helps and it wasn't too terribly confusing. Let me know if you need any follow up clarification. Dilbert [Edited to fix UBBcode] __ If it ain't broke, fix it till it is! |
|
What your regular expression says is to find within the string a pattern that has one letter/number followed by a .com, .net, or .org at the end of the string. Your string "a&a.com" matches that criteria. It has an "a", which is a valid letter/number, followed by a ".com". If you want your string to have only letters/numbers preceding the .com, etc., from the very beginning of the string, you need to write it thusly: ^[a-zA-Z0-9]+\.[com|net|org]$ I think that's right, although I didn't actually check it out myself. YMMV. |
|
Yes, thanks guys! I can see now where the missing "^" at the beginning was causing problems, as well as the "+". FYI, here's the working expresion. I added a hyphen since that is valid in a domain name. [b]^[a-zA-Z0-9\-]+\.com|net|org$[/b] Looks like the slash in front of the hyphen isn't necessary, but it's not hurting anything since that just forces it to be interpreted literally. |