Closing Doman Auctions
DNForum - Domain Sales, Domain Forum, Domain Appraisals, Domain Registrars
HomeRegisterMembershipsGetting StartedDomain Tools Domain EbooksSEO Software Domain Resellers Advertise

Go Back   DNForum - Domain Sales, Domain Forum, Domain Appraisals, Domain Registrars > Content Development > Website Development and Design Discussion > Website Reviews
Register Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old 02-18-2009, 03:07 PM   #1 (permalink)
Platinum Lifetime Member
 
kengreenwood's Avatar
 
Name: That shouldn't be too hard to figure out...
Last Online: 10-29-2009 08:46 AM
iTrader: (2)
Join Date: May 2006
Posts: 377
DNF$: 4,437
Location: Tampa
Country:


.com Zone File Query - ZFbot.com

Ok - I wanted to share my latest concoction with you all. I just finished building an app/website that will allow you to search the entire .com zone file (had to get approval from Verisign to get access to it). That file contains over 185 million records (duplicates for multi name server domains)... It's such a huge file that I had to build the app so that it is continually looping through and updating two character combination sets (you can see what it's currently updating on the lower left of the app/website). Once it finishes the entire loop (0 through zz), it downloads a new zone file (around 7 gig) and starts the whole process again.

Searching is pretty quick considering the magnitude of the data it's got to crunch through. I suggest you type at least 3 characters or it will take longer than you want.

You can download the results to a spreadsheet as well...

There are some interesting stats in the grid on the right - domain counts/percentages/changes up or down for each 2 character set (in the case of numbers, I just made 1 for each number). Keep in mind that the change values are not yet accurate since it hasn't looped through twice yet... some have been updated...some not yet. The change is interesting in that you'll be able to see what 2 character combinations are being dropped and which ones are being picked up...

You can sort any of the columns too...

If you are in the business of selling domains, it's works great to search on your own domains - I found a whole bunch of other similar domains that I wasn't even aware were registered - and they turned out to be companies with live websites... sooooo...they become potential customers for my domain portfolio. Just one use for the app.

Let me know what you think - Enjoy!

http://www.ZFBot.com

Ken
__________________
KJG
OneWorldMedia, ZFBot

Last edited by kengreenwood; 02-18-2009 at 03:37 PM..
kengreenwood is offline   Reply With Quote
Sponsored Ads
Old 02-18-2009, 03:11 PM   #2 (permalink)
Platinum Lifetime Member
No Avatar
 
Last Online: 11-06-2009 06:34 PM
iTrader: (0)
Join Date: Oct 2002
Posts: 67
DNF$: 221
Location: Audubon, PA
Country:


Ken, sounds great -- thanks for sharing; will check it out. One question: when I applied for zone file access from Verisign a while back, I remember there being a clause in the agreement that stated you were only allowed to download the zone file once in a 24-hour period. Is this not the case any longer?
audubon is offline   Reply With Quote
Old 02-18-2009, 03:22 PM   #3 (permalink)
Platinum Lifetime Member
 
kengreenwood's Avatar
 
Name: That shouldn't be too hard to figure out...
Last Online: 10-29-2009 08:46 AM
iTrader: (2)
Join Date: May 2006
Posts: 377
DNF$: 4,437
Location: Tampa
Country:


Audobon - Yes, that is still the case. But there really isn't a need to download it more than once a day. Unless you have a monster of a server, you won't be able to process the data more than once a day. My app takes about 12 to 15 hours to loop through all of the combinations and split the master file up (686 unique tables get built... this is for performance. I couldn't build an index on a 185 million record table... don't have the disk space and it would take 3 days to build the index!). You couldn't possibly be 100% accurate on the data unless you had direct access to the source. But once a day gives you probably 99% accuracy...
__________________
KJG
OneWorldMedia, ZFBot

Last edited by kengreenwood; 02-18-2009 at 03:30 PM..
kengreenwood is offline   Reply With Quote
Old 02-18-2009, 09:19 PM   #4 (permalink)
Bloody lovely
 
Acro's Avatar
 
Last Online: Yesterday 10:43 PM
iTrader: (393)
Join Date: Feb 2004
Posts: 23,730
DNF$: 3,407
Location: USA
Country:




A great idea and some programming ingenuity at work (the 2-letter table matrix). Now I can cut down on my research time when looking for potential buyers.
__________________

DomainGang.com - Domainers' Most Awesome News Source
Acroplex - Web & Graphics
Acro.net - My Blog
Acro is offline   Reply With Quote
Old 02-23-2009, 05:08 PM   #5 (permalink)
Platinum Lifetime Member
 
Coward's Avatar
 
Last Online: Today 05:56 AM
iTrader: (4)
Join Date: Oct 2007
Posts: 137
DNF$: 310
Location: europe


Wonderful!
Coward is offline   Reply With Quote
Old 02-23-2009, 05:39 PM   #6 (permalink)
 
aZooZa's Avatar
 
Name: Dale Hubbard
Last Online: Yesterday 12:09 PM
iTrader: (45)
Join Date: Jan 2003
Posts: 5,868
DNF$: 5,845
Location: Exeter, England
Country:


Ken, you might find that MySQL is too resource hungry for this particular application. Did you consider stripping out all the duplicate NS data from the zone? You can use awk and grep with flat files and that takes minutes instead of 12 -15 hours. If you look at the zone file structure, you can see the distinct points in each line (A, NS) where awk and grep will work to separate out the individual domains. I wrote a bash script that does the main donkey work if you'd like me to dig it out.

Anyway, just a thought. I have to say it's a very well presented site indeed!
__________________
UK Drop Catching Services: Dropsystem.co.uk
New! Canada TBR Drop Catching: Dropping.ca
New! QUALITY MiniSites: NOTsoMINI.com
aZooZa is offline   Reply With Quote
Old 02-28-2009, 04:46 PM   #7 (permalink)
Platinum Lifetime Member
 
kengreenwood's Avatar
 
Name: That shouldn't be too hard to figure out...
Last Online: 10-29-2009 08:46 AM
iTrader: (2)
Join Date: May 2006
Posts: 377
DNF$: 4,437
Location: Tampa
Country:


Quote:
Originally Posted by aZooZa View Post
Ken, you might find that MySQL is too resource hungry for this particular application. Did you consider stripping out all the duplicate NS data from the zone? You can use awk and grep with flat files and that takes minutes instead of 12 -15 hours. If you look at the zone file structure, you can see the distinct points in each line (A, NS) where awk and grep will work to separate out the individual domains. I wrote a bash script that does the main donkey work if you'd like me to dig it out.

Anyway, just a thought. I have to say it's a very well presented site indeed!
Dale - if you have the script that greps out the domans to a file, that would help a bit... What I'm currently doing is bulk loading the entire file into a table and then deleting any record where it's not a domain...but I still have the ns field and the actual name server name... so it's making the table much bigger than it needs to be....resulting in the rest of the process slowing down. I'd love to use Oracle for this but I don't have the time to install Oracle on my server right now and I also don't want any Oracle cronies jumping down my throat about license issues. Could use the express version i guess... anyway, I'd appreciate that script if you can find it... thanks...

FYI - I am currently loading the .net domains in as I'm typing this... total between the .com and .net domains will be around 92 million. Once I load all of the .net domains in and split them off for the first time, I'll upload the new .swf front end to the app that allows you to select .com, .net or both in the query.
__________________
KJG
OneWorldMedia, ZFBot
kengreenwood is offline   Reply With Quote
Old 02-28-2009, 05:00 PM   #8 (permalink)
Administrator
 
DotComGod's Avatar
 
Name: Adam Dicker
Last Online: Yesterday 09:27 PM
iTrader: (38)
Join Date: Feb 2003
Posts: 10,677
DNF$: 4,588,257
Location: Toronto, Canada
Country:


Excellent Tool!

-=DCG=-
DotComGod is offline   Reply With Quote
Old 02-28-2009, 05:19 PM   #9 (permalink)
Platinum Lifetime Member
 
kengreenwood's Avatar
 
Name: That shouldn't be too hard to figure out...
Last Online: 10-29-2009 08:46 AM
iTrader: (2)
Join Date: May 2006
Posts: 377
DNF$: 4,437
Location: Tampa
Country:


thanks Adam...

You'll also notice that any of the two letter/digit combo's that have yet to be updated today won't have an extension on them... that's due to the fact that I now have .com and .net domains in the database.... once it's finished today (and going forward), you'll see the extension.
__________________
KJG
OneWorldMedia, ZFBot

Last edited by kengreenwood; 02-28-2009 at 06:04 PM..
kengreenwood is offline   Reply With Quote
Old 02-28-2009, 06:57 PM   #10 (permalink)
Bloody lovely
 
Acro's Avatar
 
Last Online: Yesterday 10:43 PM
iTrader: (393)
Join Date: Feb 2004
Posts: 23,730
DNF$: 3,407
Location: USA
Country:




Awesome. I miss the days of the unified com/net/org Registry though. Nowadays don't expect to extract similar data from PIR :(
__________________

DomainGang.com - Domainers' Most Awesome News Source
Acroplex - Web & Graphics
Acro.net - My Blog
Acro is offline   Reply With Quote
Old 03-01-2009, 04:54 AM   #11 (permalink)
Dn Guru©
 
-ET-'s Avatar
 
Name: 3ldo Thomas
Last Online: 11-06-2009 02:48 PM
iTrader: (41)
Join Date: Nov 2006
Posts: 551
DNF$: 534
Location: Neighbourhood


Awesome tool! will use it for sure in coming days.
-ET- is offline   Reply With Quote
Old 03-01-2009, 12:24 PM   #12 (permalink)
Platinum Lifetime Member
 
kengreenwood's Avatar
 
Name: That shouldn't be too hard to figure out...
Last Online: 10-29-2009 08:46 AM
iTrader: (2)
Join Date: May 2006
Posts: 377
DNF$: 4,437
Location: Tampa
Country:


FYI - I loaded up the new front end with the drop down for selecting .com, .net or both in your query.

http://www.zfbot.com

I also added a column with a link to the archive.org information for the domain - if there is any, that is... (the wayback machine). Not a major deal, but a nice little feature.
__________________
KJG
OneWorldMedia, ZFBot

Last edited by kengreenwood; 03-01-2009 at 06:40 PM.. Reason: Automerged Doublepost
kengreenwood is offline   Reply With Quote
Old 03-02-2009, 04:12 AM   #13 (permalink)
Platinum Lifetime Member
 
jmcc's Avatar
 
Name: John McCormac
Last Online: 11-06-2009 10:19 AM
iTrader: (0)
Join Date: Oct 2006
Posts: 30
DNF$: 1,110
Location: Ireland
Country:


Quote:
Originally Posted by kengreenwood View Post
Dale - if you have the script that greps out the domans to a file, that would help a bit... What I'm currently doing is bulk loading the entire file into a table and then deleting any record where it's not a domain...but I still have the ns field and the actual name server name... so it's making the table much bigger than it needs to be....resulting in the rest of the process slowing down.
It is a very messy way of doing it. As Dale suggested, it is far quicker to parse the zonefile using scripts. Doing it this way is essential if you are going to mechanise or automate the process. It only takes a few minutes to parse the domains from the zonefile.

Quote:
I'd love to use Oracle for this but I don't have the time to install Oracle on my server right now and I also don't want any Oracle cronies jumping down my throat about license issues.
You don't need to use Oracle. MySQL can handle this kind of thing easily. Crunchwise, you are doing too much too early.

MySQL could handle the total .com list. The number of distinct .com domains (as of yesterday's zone was only around 79.5 million domains). Loading it into a single table on a desktop PC took 1 hour 8 min 56.55 sec. The query time for a two character count was 38.17 seconds with a simple domain based index. A single table is a very inefficient method of doing this kind of work. It all comes down to computability. It is far more efficient to run a set of queries on smaller tables (alphanumerical) and use these results to build your stats table. You can use various tricks such as limiting the number of characters used to build the index or even a number of indexes.

It would then be simply a case of running the stats query on each smaller table to update your stats table. I don't know how far back historically you are running your stats table but you are effectively creating a spreadsheet with it. If it is a simple two set historical (today's and yesterday's figure) then it is a lot simpler.

You did a good job getting this far.

Regards...jmcc
__________________
http://www.hosterstats.com
Hoster Stats on 2.9M+ hosters and Domain DNS History Database.
Tracks over 236 Million active and deleted domains.
jmcc is offline   Reply With Quote
Old 03-04-2009, 09:47 AM   #14 (permalink)
Platinum Lifetime Member
 
kengreenwood's Avatar
 
Name: That shouldn't be too hard to figure out...
Last Online: 10-29-2009 08:46 AM
iTrader: (2)
Join Date: May 2006
Posts: 377
DNF$: 4,437
Location: Tampa
Country:


Quote:
Originally Posted by jmcc View Post
It is far more efficient to run a set of queries on smaller tables (alphanumerical) and use these results to build your stats table. You can use various tricks such as limiting the number of characters used to build the index or even a number of indexes.
I'm already doing all of what you stated above. The queries are not occurring against a single table. That would be foolish. I've been working with Oracle/MySQL databases for over 15 years... trust me, I know what I'm doing. But I appreciate the input!

I got a suggestion from Acro to perhaps add a warning on the www link when it may be "adult" in nature... which was a good suggestion... but determining if it's an "adult" site would be difficult so now if you mouse over the www button of any domains, you'll see a snapshot of the website, if there is one. And you can pretty easily see if the domain is parked or if there is a legitimate site up and running...
__________________
KJG
OneWorldMedia, ZFBot

Last edited by kengreenwood; 03-04-2009 at 10:24 AM..
kengreenwood is offline   Reply With Quote
Old 03-05-2009, 05:58 AM   #15 (permalink)
Platinum Lifetime Member
 
jmcc's Avatar
 
Name: John McCormac
Last Online: 11-06-2009 10:19 AM
iTrader: (0)
Join Date: Oct 2006
Posts: 30
DNF$: 1,110
Location: Ireland
Country:


Quote:
Originally Posted by kengreenwood View Post
I'm already doing all of what you stated above. The queries are not occurring against a single table. That would be foolish. I've been working with Oracle/MySQL databases for over 15 years... trust me, I know what I'm doing. But I appreciate the input!
The key to handling large datasets such as zonefiles is preparing the data before inserting it into the database rather than throwing it all into the database and then sorting it out.

I ran a simple test on the .com domain list, breaking it down, loading it into a set of tables and then generating stats subtables. The process of generating the schema, loading the data and generating the stats data took approximately two hours. That was on an old Semperon 3G box running MySQL with a barely tweaked configuration. The whole thing, including parsing and formatting the zonefile data, shouldn't take more than three hours. Breaking it down into smaller tables should only take about an hour and a half - faster if the breakdown was done first and the stats second. Preprocessing the zonefile data will remove the bottleneck that causes your process to take 12 to 15 hours to complete.

Regards...jmcc
__________________
http://www.hosterstats.com
Hoster Stats on 2.9M+ hosters and Domain DNS History Database.
Tracks over 236 Million active and deleted domains.
jmcc is offline   Reply With Quote
Old 03-05-2009, 10:07 AM   #16 (permalink)
Platinum Lifetime Member
 
kengreenwood's Avatar
 
Name: That shouldn't be too hard to figure out...
Last Online: 10-29-2009 08:46 AM
iTrader: (2)
Join Date: May 2006
Posts: 377
DNF$: 4,437
Location: Tampa
Country:


Quote:
Originally Posted by jmcc View Post
The key to handling large datasets such as zonefiles is preparing the data before inserting it into the database rather than throwing it all into the database and then sorting it out.

I ran a simple test on the .com domain list, breaking it down, loading it into a set of tables and then generating stats subtables. The process of generating the schema, loading the data and generating the stats data took approximately two hours. That was on an old Semperon 3G box running MySQL with a barely tweaked configuration. The whole thing, including parsing and formatting the zonefile data, shouldn't take more than three hours. Breaking it down into smaller tables should only take about an hour and a half - faster if the breakdown was done first and the stats second. Preprocessing the zonefile data will remove the bottleneck that causes your process to take 12 to 15 hours to complete.

Regards...jmcc
Couldn't agree with you more - I've been chatting with Dale about this as well. My forte is the database work - the pre-processing of the data within Unix, using command line stuff is not my forte. Soooo, if either you or Dale have a script that cleans up the data first, it would speed up my process.
__________________
KJG
OneWorldMedia, ZFBot
kengreenwood is offline   Reply With Quote
Old 03-11-2009, 10:03 AM   #17 (permalink)
Platinum Lifetime Member
 
kengreenwood's Avatar
 
Name: That shouldn't be too hard to figure out...
Last Online: 10-29-2009 08:46 AM
iTrader: (2)
Join Date: May 2006
Posts: 377
DNF$: 4,437
Location: Tampa
Country:


Line chart trend...

Okay - I've just added a neat little feature to the ZFBot. If you click on any of the rows in the right-hand grid, a line chart will pop up showing the trend. Obviously it doesn't mean much with only a few days of trend stored but it's gonna look pretty cool after 30, 60, 90 days... or a year or more. I'm going to add either a radio button or drop down at the bottom of the chart that will allow you to select different date ranges like last 3 months, last year, etc...

check out the 'ju' chart for an example (fictitious history right now)

I'll be adding a couple pie charts as well that show the breakdown of domains...

Also - I'm working on adding the ability to search on name server as well. If you wanted to see all of the domains on your name server for example, you could find them all easily.

http://www.zfbot.com
__________________
KJG
OneWorldMedia, ZFBot
kengreenwood is offline   Reply With Quote
Old 03-11-2009, 10:13 AM   #18 (permalink)
Success Is My Only Option
 
Carter's Avatar
 
Last Online: 11-06-2009 11:14 AM
iTrader: (43)
Join Date: Jul 2008
Posts: 4,229
DNF$: 27,095
Location: Italy
Country:


Fantastic tool congrats!!
Carter is offline   Reply With Quote
Old 03-11-2009, 01:09 PM   #19 (permalink)
Platinum Lifetime Member
 
kengreenwood's Avatar
 
Name: That shouldn't be too hard to figure out...
Last Online: 10-29-2009 08:46 AM
iTrader: (2)
Join Date: May 2006
Posts: 377
DNF$: 4,437
Location: Tampa
Country:


One tip for the charts - once the chart is displayed, you can just click or hold down any letter and it will automatically scroll through and find the appropriate value in the grid to display the associated chart.
__________________
KJG
OneWorldMedia, ZFBot
kengreenwood is offline   Reply With Quote
Old 03-11-2009, 02:03 PM   #20 (permalink)
CrossLogix.com
 
copper's Avatar
 
Last Online: Yesterday 04:44 PM
iTrader: (65)
Join Date: Mar 2006
Posts: 2,237
DNF$: 2,163
Location: Matthews, NC. U


Quote:
Originally Posted by kengreenwood View Post
One tip for the charts - once the chart is displayed, you can just click or hold down any letter and it will automatically scroll through and find the appropriate value in the grid to display the associated chart.
Thanks for the Great Tool.
I already used it many times.
Didn't know it was yours

But...
What did you just say
__________________

Domain Names For Sale
copper is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -5. The time now is 09:28 AM.
Copyright @2001-2009 DNForum.com