
Originally Posted by
jmcc
The key to handling large datasets such as zonefiles is preparing the data before inserting it into the database rather than throwing it all into the database and then sorting it out.
I ran a simple test on the .com domain list, breaking it down, loading it into a set of tables and then generating stats subtables. The process of generating the schema, loading the data and generating the stats data took approximately two hours. That was on an old Semperon 3G box running MySQL with a barely tweaked configuration. The whole thing, including parsing and formatting the zonefile data, shouldn't take more than three hours. Breaking it down into smaller tables should only take about an hour and a half - faster if the breakdown was done first and the stats second. Preprocessing the zonefile data will remove the bottleneck that causes your process to take 12 to 15 hours to complete.
Regards...jmcc
Bookmarks