How to parse Rapidshare URLs with C#

by Matthias Broschk May 03, 2004 02:20

File-hosters such as Rapidshare have been exploding everywhere within the last few years. Since it is common for the files to be split into several archives and the links to them are sometimes spread over one or even more web pages it can be a pain in the neck to gather them individually (.

Therefore I wrote a small parser, which retrieves all Rapidshare links from a given URL by using Regular Expressions. This doesn’t primarily focus on links within a “<A>”-tag, but works for all appropriate matches on that specific page.

Of course you can easily replace Rapidshare by a different file-hoster.

WebClient webClient = new WebClient();
List<string> result = new List<string>();

string html = webClient.DownloadString(url);
string pattern = @"http://rapidshare.com/files/[0-9]+/[A-Za-z0-9.,-_=%]+\.(mp3|zip|mpeg|pdf|rar|avi|wmv)";

MatchCollection matches = Regex.Matches(html, pattern, RegexOptions.IgnoreCase);

foreach (Match match in matches)
	if (!result.Contains(match.Value))
		result.Add(match.Value);

Tags: , ,

Misk

Comments

Add comment


(Will show your Gravatar icon)

  Country flag

biuquote
  • Comment
  • Preview
Loading



Powered by BlogEngine.NET 1.5.0.7
Theme by Mads Kristensen | Modified by Mooglegiant

About the author

Matthias Broschk from Hamburg (Germany) wrote his first goto statements in QuickBasic at the age of fifteen, switched to Visual Studio (i.e. Visual Basic / Visual C++) at version 5.0 and has been an addict of .NET since its early beginnings. There have been many other languages and frameworks (Java, PHP, ... ), but none about which he has been as enthusiastic as .NET. These days you can find more information in blogs rather than anywhere else. Therefore he has decided to share his experiences and start yet another .NET blog.