Flex 3-RegExp: Find Urls In Text And Html

5 Comments | Dec 18, 2008

Flex 3-RegExp: Find Urls In Text And Html

There are a number of situations where you'd want to grab the urls from a block of text. For example you may be loading in some external or dynamic data and want to make the links clickable, or change their colour. Regular expressions are used in a multitude of languages; they define patterns that can be matched against a string, thus certain key characters used in defining a RegExp have to be escaped so they are interpreted as special characters like \d matches any digit. In Actionscript, you can define a RegExp by either wrapping it in double quotes "", or forward slashes//. In each case you would have to escape any characters that match the wrapping in addition to the characters that need to be escaped in the actual pattern. Further more Actionscript requires you to separate out the last part of the regular expression, called flags, and insert it as the second argument when defining a new RegExp object.

Here's how you find a url in text or html:

var str:String = new String('This is a url www.fightskillz.com, and this is another one: <a href="http://chalk-it-out.com">Chalk It Out</a>');
var reg:RegExp = new RegExp("\\b(((https?)://)|(www.))([a-z0-9-_.&amp;=#/]+)", 'i');
var result:Object = reg.exec(str);
trace(result[0]);

First off if you're new to Flex/Actionscript you have to copy and paste this into a function and the variables created will only be accessable within that function while it's running as they are created and destroyed as it runs. If you wanted more permanence you'd just define the variables outside the function.

Now Let's break it down. The first \ is used as a character escape for Actionscript. In actionscript when defining a string within double quotes you'd escape a double that's part of the string like this "Look at this double quote \"". \b searches for a word boundary ie: a whitespace, or the beginning or end of a string.The next part ((https?)://)|(www.))defines the first part of a 'word' that passes for a url. It's made up of two substrings, the first looks for http, the question mark deems the preceding character optional, so it'll match to https as well. It then looks to see if the protocol is followed by ://. The |character means OR, so if there is no protocol specified, it checks for (www.). Next we have [a-z0-9-_.&=#/] which is a list of characters a to z, 0 to 9, and various others commonly found in urls. This is followed by a + which instructs the pattern to match the preceding list of characters until it can't anymore. It can't anymore when it reaches whitespace, a single or double quote, brackets, or any other non-url character. Finally the RegExp flag i informs the pattern to be case insensitive.

reg.exec(str); executes the pattern on the specified string and returns the results as an array. Since the example is only designed to match the first url it encounters and then stop, the array will only have one result. The method reg.exec(str) is interchangable withstr.match(reg)

Category: Uncategorized

Tagged: Flex, html, regexp, urls

Craig_home

Thanks for this. I want to use this code – but i need to run a regex and return results against a whole loaded html page loaded with a mx:html – is this possible?
http://yoavgivati.com Yoav Givati

I haven’t spent a lot of time in Flex lately, but you should be able to use the g RegExp flag to match globally. Then loop through the result object.

It might get a little tricky when it comes to detecting appropriate links, I’m not sure what you’re building (it sounds interesting) but for example running it against the source for an html page you’d pick up links to namespaces, stylesheets, and image resources as well. If you were just after the links on the page, or just the external links you’d have to go for a more complex – or set of complex regular expressions to either strip away the head, script, link, image, video, and audio tags, or only look in anchor tags’ href property.
Craig_home

Thanks for that. In the html page there is are multiple productcodes,e.g. x=1234 , x=4344 which I’ve worked out how to extract with regex but can’t work out how to match them in the loaded mx:html. Really struggling on this one so let me know if you think of anything.
Many thanks
http://yoavgivati.com Yoav Givati

If you’re asking how to access the page’s source – there might be an easier way, but generally you’d access the html.htmlLoader.window.document which exposes the DOM to actionscript.

You can even do stuff like this from actionscript:

html.htmlLoader.window.document.getElementsByTagName(“a”);

Stick it in a function binded to the Event.LOCATION_CHANGE event and you should be golden.
http://www.lazerucretleri.com lazer fiyatlari

Thank you for that. In the HTML page there are several product codes, for example, x = 1234 x = 4344, which I figured out how to have worked with regex extract can work but not

Tags

Featured Posts

Archives

Categories

Flex 3-RegExp: Find Urls In Text And Html