Script information

Name: PHP DMOZ parser (dmoz2mysql)
Current version: 3.0 (24. Maj 2004)
License type: The GPL
Script website: http://amix.dk/codecrib/
Author: Amir Salihefendic (amix@amix.dk)
Copyright: JFL Webcom (http://www.webcom.dk
SourceForge project page: https://sourceforge.net/projects/dmoz2mysql/

New things/improvements in this version 3.0

Well, well! I have worked all night to make this version :-) And I am impressed with the result. Version 3 kicks ass.
Tha new things

PHP DMOZ parser (RDF->MySQL DB) - Read me first

What can this script do? What is it?
This is a PHP script that is used to parse the DMOZ RDF data dump files. For more information about these files visit http://rdf.dmoz.org/.

Current features of this script include:

The speed of the parsing depends on your computer. I have an Athlon TB 1 GHZ and it takes me ~ 25 min. to extract, clean and parse the structure RDF file.

REQUIREMENTS

How to start the script?

Step 1
Open config.php and edit it.
Step 2
Run create_tables.php to create the tables in your database.
Step 3
Run start_script.php from the prompt (i.e. php start_script.php)

Abracadabra, the script handles the dirty work for you :) Just lay back and relax - and smoke a joint (nah bad idea hehe).

Additional
Run drop_tables.php to delete the tables in your database.

Bye Bye

Cheers,
Amir Salihefendic, amix@amix.dk

PS: If you have some troubles or find some bugs - - - post them on https://sourceforge.net/projects/dmoz2mysql/

THANKS!

Changelog

17. dec. 2003
set_time_limit(0); that sets maximum execution time to none
2. feb. 2004
Fixed a MAJOR bug (a catid bug that gave almost all catid 0!). A big thanks goes to Murray Woodman and Tony Spencer for reporting this bug.Moved the querys for creating tables out of start_script.php (too many people had problems with them)! To create your tables you need to run create_tables.php.
12. feb. 2004
This is a major update :) It fixes some very, very nasty bugs. It adds some new features - plus some little tweaks here and there!

Fixed some MAJOR bugs (bugs that made catid's turn to 0!) I have updated the code to extract the DMOZ data dump files - some users had problems, and now I have updated the code. I have tested it on Windows and Mac OS X and it worked fine.

A new features is that you control the script from config.php.

Well, have fun :) I hope the script works fine! PS: I have updated all "" with '' (where I could) - I really don't hope it gives some problems, if you find an error, then email amix@amix.dk.

13. feb. 2004
Again a major update :) I haven't realesed version 2.0 since it had some problems. It didn't work since it should use LOT's of memory! Now I have fixed that error by making a new class that splits the RDF file into small files (25 MB). This makes it easy to load them into the memory - and now the script works WITHOUT any bugs ;-) [I have tested it for hours now ehehe].
18. feb. 2004
Fixe a little bug in class_parse.php that didn't split the content file.