Programmer Gurus, need Regular Expression help > General Discussion

Posted: 11/2/2001 12:34:34 PM EDT

I've been writing code for quite a while but never got around to learning regular expressions, until today. Had to write some code to validate a domain name as being properly formatted, and figured I'd learn how to do it with reg exp.

Here's my pattern: [b][a-zA-Z0-9]\.com|net|org$[/b]

Which I understand as "match until you don't find a letter or number, then you must find a dot/period then com, net or org". Problem is that I can pass it stuff like a&a.com and it matches, and I can't figure it out.

Any help would be, well, helpfull. [:)]

Posted: 11/2/2001 1:09:49 PM EDT

[#1]

Hi DVDTracker,

I assume you are talking PERL regexps? There is slightly different syntax for different languages (various UNIX shells, xemacs, C language, etc).

The problem is anything in brackets means you are only matching one letter by default.

So your expression matches:
a&a.com
because it is finding a match with the underlined portion:
a&[u]a.com[/u]

If you are trying to match any alphanumeric for any number of characters, followed by the .net, .com, or .org domains, your expression should look like this:
[b]^[a-zA-Z0-9]+\.com|net|org$[/b]

I'm assuming the rest of your syntax is right. Offhand I forget how the logical or operator works (the |).

The changes are the ^ anchor. This anchors your regexp to the beginning of the word. That was why your previous regexp was not excluding the "a&" portion in your example -- you didn't tell it that it couldn't have anything in front of the string you were matching.

The other change is the "+" modifier. As I said earlier, anything in brackets denotes just one character. So the + modifier means that you are now matching for one or more characters that meet the criterion of [a-zA-Z0-9]. If it was okay to have zero or more characters, you could use the * modifier instead of the +, but I don't think that is what you want in your example.

Hope that helps and it wasn't too terribly confusing. Let me know if you need any follow up clarification.

Dilbert

[Edited to fix UBBcode]
__
If it ain't broke, fix it till it is!

Posted: 11/2/2001 1:11:08 PM EDT

[#2]

What your regular expression says is to find within the string a pattern that has one letter/number followed by a .com, .net, or .org at the end of the string. Your string "a&a.com" matches that criteria. It has an "a", which is a valid letter/number, followed by a ".com". If you want your string to have only letters/numbers preceding the .com, etc., from the very beginning of the string, you need to write it thusly:

^[a-zA-Z0-9]+\.[com|net|org]$

I think that's right, although I didn't actually check it out myself. YMMV.

Posted: 11/2/2001 1:16:32 PM EDT

[#3]

Yes, thanks guys! I can see now where the missing "^" at the beginning was causing problems, as well as the "+". FYI, here's the working expresion. I added a hyphen since that is valid in a domain name.

[b]^[a-zA-Z0-9\-]+\.com|net|org$[/b]

Looks like the slash in front of the hyphen isn't necessary, but it's not hurting anything since that just forces it to be interpreted literally.

Posted: 11/2/2001 1:28:33 PM EDT

[#4]

Actually, since we need to support all the top-level domains:

[b]^[a-zA-Z0-9\-]+\.[a-zA-Z]\{2,4}$[/b]

Posted: 11/2/2001 4:38:48 PM EDT

[#5]

DVDTracker, your last rule allows for invalid domain names that start with a - to still pass. Below is the rule I developed years ago to look for valid e-mail addresses:

^[0-9a-z]([-_.]?[0-9a-z]\.?)*@[0-9a-z]([-.]?[0-9a-z])*\\.[a-z]+$

Posted: 11/2/2001 4:46:01 PM EDT

[#6]

Boy, good luck with that working 100%. Now all you have to do is include all the national domains like ".co.uk" for commercial UK sites, etc, ad nauseum. You might try Network Solutions for a list of all TLDs.

Posted: 11/2/2001 4:49:47 PM EDT

[#7]

Luckily were not handling domains like that. This is for domains for auto repair/parts companies in the US. I'd bet 99% of them will be .com

Warning

Confirm Action

[ARCHIVED THREAD] - Programmer Gurus, need Regular Expression help

[ARCHIVED THREAD] - Programmer Gurus, need Regular Expression help