5 Comments | Dec 18, 2008
Flex 3-RegExp: Find Urls In Text And Html
There are a number of situations where you'd want to grab the urls from a block of text. For example you may be loading in some external or dynamic data and want to make the links clickable, or change their colour. Regular expressions are used in a multitude of languages; they define patterns that can be matched against a string, thus certain key characters used in defining a RegExp have to be escaped so they are interpreted as special characters like \d matches any digit. In Actionscript, you can define a RegExp by either wrapping it in double quotes "", or forward slashes//. In each case you would have to escape any characters that match the wrapping in addition to the characters that need to be escaped in the actual pattern. Further more Actionscript requires you to separate out the last part of the regular expression, called flags, and insert it as the second argument when defining a new RegExp object.
Here's how you find a url in text or html:
var str:String = new String('This is a url www.fightskillz.com, and this is another one: <a href="http://chalk-it-out.com">Chalk It Out</a>'); var reg:RegExp = new RegExp("\\b(((https?)://)|(www.))([a-z0-9-_.&=#/]+)", 'i'); var result:Object = reg.exec(str); trace(result[0]);
First off if you're new to Flex/Actionscript you have to copy and paste this into a function and the variables created will only be accessable within that function while it's running as they are created and destroyed as it runs. If you wanted more permanence you'd just define the variables outside the function.
Now Let's break it down. The first \ is used as a character escape for Actionscript. In actionscript when defining a string within double quotes you'd escape a double that's part of the string like this "Look at this double quote \"". \b searches for a word boundary ie: a whitespace, or the beginning or end of a string.The next part ((https?)://)|(www.))defines the first part of a 'word' that passes for a url. It's made up of two substrings, the first looks for http, the question mark deems the preceding character optional, so it'll match to https as well. It then looks to see if the protocol is followed by ://. The |character means OR, so if there is no protocol specified, it checks for (www.). Next we have [a-z0-9-_.&=#/] which is a list of characters a to z, 0 to 9, and various others commonly found in urls. This is followed by a + which instructs the pattern to match the preceding list of characters until it can't anymore. It can't anymore when it reaches whitespace, a single or double quote, brackets, or any other non-url character. Finally the RegExp flag i informs the pattern to be case insensitive.
reg.exec(str); executes the pattern on the specified string and returns the results as an array. Since the example is only designed to match the first url it encounters and then stop, the array will only have one result. The method reg.exec(str) is interchangable withstr.match(reg)