Warning

 

Close
Confirm Action

Are you sure you wish to do this?

Cancel Confirm
AR15.COM
11/2/2001 12:34:34 PM EDT
I've been writing code for quite a while but never got around to learning regular expressions, until today.  Had to write some code to validate a domain name as being properly formatted, and figured I'd learn how to do it with reg exp.

Here's my pattern:   [b][a-zA-Z0-9]\.com|net|org$[/b]

Which I understand as "match until you don't find a letter or number, then you must find a dot/period then com, net or org".  Problem is that I can pass it stuff like a&a.com and it matches, and I can't figure it out.  

Any help would be, well, helpfull. [:)]

11/2/2001 1:09:49 PM EDT
[#1]
Hi DVDTracker,

I assume you are talking PERL regexps?  There is slightly different syntax for different languages (various UNIX shells, xemacs, C language, etc).

The problem is anything in brackets means you are only matching one letter by default.

So your expression matches:
a&a.com
because it is finding a match with the underlined portion:
a&[u]a.com[/u]

If you are trying to match any alphanumeric for any number of characters, followed by the .net, .com, or .org domains, your expression should look like this:
[b]^[a-zA-Z0-9]+\.com|net|org$[/b]

I'm assuming the rest of your syntax is right. Offhand I forget how the logical or operator works (the |).  

The changes are the ^ anchor.  This anchors your regexp to the beginning of the word.  That was why your previous regexp was not excluding the "a&" portion in your example -- you didn't tell it that it couldn't have anything in front of the string you were matching.

The other change is the "+" modifier.  As I said earlier, anything in brackets denotes just one character.  So the + modifier means that you are now matching for one or more characters that meet the criterion of [a-zA-Z0-9].  If it was okay to have zero or more characters, you could use the * modifier instead of the +, but I don't think that is what you want in your example.

Hope that helps and it wasn't too terribly confusing.  Let me know if you need any follow up clarification.

Dilbert

[Edited to fix UBBcode]
__
If it ain't broke, fix it till it is!
11/2/2001 1:11:08 PM EDT
[#2]
What your regular expression says is to find within the string a pattern that has one letter/number followed by a .com, .net, or .org at the end of the string.  Your string "a&a.com" matches that criteria.  It has an "a", which is a valid letter/number, followed by a ".com".  If you want your string to have only letters/numbers preceding the .com, etc., from the very beginning of the string, you need to write it thusly:

^[a-zA-Z0-9]+\.[com|net|org]$

I think that's right, although I didn't actually check it out myself.  YMMV.
11/2/2001 1:16:32 PM EDT
[#3]
Yes, thanks guys!  I can see now where the missing "^" at the beginning was causing problems, as well as the "+".  FYI, here's the working expresion.  I added a hyphen since that is valid in a domain name.

[b]^[a-zA-Z0-9\-]+\.com|net|org$[/b]

Looks like the slash in front of the hyphen isn't necessary, but it's not hurting anything since that just forces it to be interpreted literally.

11/2/2001 1:28:33 PM EDT
[#4]
Actually, since we need to support all the top-level domains:

[b]^[a-zA-Z0-9\-]+\.[a-zA-Z]\{2,4}$[/b]
11/2/2001 4:38:48 PM EDT
[#5]
DVDTracker, your last rule allows for invalid domain names that start with a - to still pass.   Below is the rule I developed years ago to look for valid e-mail addresses:

^[0-9a-z]([-_.]?[0-9a-z]\.?)*@[0-9a-z]([-.]?[0-9a-z])*\\.[a-z]+$
11/2/2001 4:46:01 PM EDT
[#6]
Boy, good luck with that working 100%. Now all you have to do is include all the national domains like ".co.uk" for commercial UK sites, etc, ad nauseum. You might try Network Solutions for a list of all TLDs.
11/2/2001 4:49:47 PM EDT
[#7]
Luckily were not handling domains like that.  This is for domains for auto repair/parts companies in the US.  I'd bet 99% of them will be .com