
| | |||||||
| Website Development HTML, traffic, hosting, and more... this is the forum for every webmaster. |
Register Now for FREE! | |||||
| |

| | LinkBack | Thread Tools | Display Modes |
| |||
| Hi All, I have a challenge and need some help/advice. In a nutshell, I want to extract a lot of data from a website (its in the public domain) and strip out the data I need (its wrapped in HTML and isnt too difficult to see), eventually exporting this to a Excel spreadsheet in column format. I know there is a lot of software out there that will create macros and a bunch of other stuff to do the job, but I need to know that whatever I use will work AND something I can run every week by a click of a few buttons.. Any help/advice/offers to create it (paid of course) greatly appreciated! Thanks, OllieB |
| Sponsored Links | ||
| |
| ||||
| Hi OllieB, Welcome to CompuForums - great to have you here. I hope you can visit us often in the future and it would be great if you could add an entry to our member map. Firstly, what sort of data are you extracting? There are some free applications that can do this sort of thing, but it depends on what you are grabbing. Secondly, what site is it, and is it a big one? Doing this will take a lot of bandwidth, and you could end up using more than the site's monthly allowance, thus causing the site to either go offline for a month or causing the webmaster to pay a heavy bill.
__________________ Thanks, Ash CF Founder Great Webhosting. Shared starting at $2 per month. VPSes starting at $6 per month. www.Centicero.com Want to get in touch? Send me a PM | Do you want to continue receiving free help? Or do you want this site to close? Become a premium member. |
| |||
| Thanks for the swift reply Ash. The data is simply a list of names, locations, referenceID's, and dates. There is a drop down menu that specifies each location, and then a calendar specifiying dates. Its this kind of data that I need to automate. With regards to the URL, if I can get a feel that this is possible and can be automated (which I'm sure it can) then I'll send you the link (bear with me on this.) As an example, if the source code gave me all the info I needed on he first page, then I think I'd probably be able to strip out the data myself, but unfortunately, its over several hundred pages. |
| ||||
| It might be simpler to contact the owner of the website and ask if they can provide a MySQL dump. This is the database of information which is then formatted when you access the site. The owner of the site can generate a comma-separated CSV file of the data through their control panel.
__________________ Thanks, Ash CF Founder Great Webhosting. Shared starting at $2 per month. VPSes starting at $6 per month. www.Centicero.com Want to get in touch? Send me a PM | Do you want to continue receiving free help? Or do you want this site to close? Become a premium member. |
| |||
| Thanks Ash I doubt the site administrators would do this for me, and I'd need the csv file once, maybe twice a month. Do you have skills to do this kind of thing, inc. stripping out the relevant data and exporting it to Excel once pulled from the site? I can email you the site and details if you think you might be able to help? Thanks, OllieB |
| ||||
| I am not a programmer myself however there are people here who may be able to help - they will reply if they can offer assistance. But, you should still check with the site administrator if it's okay to do it - otherwise you could end up making their site go offline.
__________________ Thanks, Ash CF Founder Great Webhosting. Shared starting at $2 per month. VPSes starting at $6 per month. www.Centicero.com Want to get in touch? Send me a PM | Do you want to continue receiving free help? Or do you want this site to close? Become a premium member. |
| |||
| If you have a Linux/Unix machine, or even a Virtual Machine I would make a shell script to wget the webpages and save them too a directory, and then to grab all the mysql from the server and save it as an SQL file. Or I would tell it to FTP in and grab from FTP if you have FTP access. This could be set up on CHRON to make it do it say every sunday night. Just an idea. But if your a Win32 user, I guess manually is the way. I wouldn't know too much, I tend to be a linux/mac/unix user.
__________________ -Rob Putt - Blog! CompuForums Secondary Administrator + Download the CompuForums Thread Viewer! + Add yourself to the Member Map! + Be sure that you are up-to-date with the rules. + Still Not A Member? Register Now! + Contact Us + Email Me! - rob at compuforums dot org +ipHideAway - Unfilter Anything Anywhere! Anonamize your surfing today!! |
| |||
| Hi Rob Thanks for the advice. I used to have a linux box - but no longer. I understand the benefits of using a shell script, using wget in a loop to get the data and output to a file, and using Cron to schedule it. I'm not familiar with SQL file formats or how to use them. I see the whole process in 3 steps. 1. use wget to extract data pages. 2. use awk (or whatever) to strip out the relevant data. 3. export this data to Excel in columns. Are you able to do this kind of work yourself? |