File-hosters such as Rapidshare have been exploding everywhere within the last few years. Since it is common for the files to be split into several archives and the links to them are sometimes spread over one or even more web pages it can be a pain in the neck to gather them individually (.
Therefore I wrote a small parser, which retrieves all Rapidshare links from a given URL by using Regular Expressions. This doesn’t primarily focus on links within a “<A>”-tag, but works for all appropriate matches on that specific page.
Of course you can easily replace Rapidshare by a different file-hoster.
WebClient webClient = new WebClient();
List<string> result = new List<string>();
string html = webClient.DownloadString(url);
string pattern = @"http://rapidshare.com/files/[0-9]+/[A-Za-z0-9.,-_=%]+\.(mp3|zip|mpeg|pdf|rar|avi|wmv)";
MatchCollection matches = Regex.Matches(html, pattern, RegexOptions.IgnoreCase);
foreach (Match match in matches)
if (!result.Contains(match.Value))
result.Add(match.Value);