If you are new to domains and looking to buy, sell and learn about domains then you have come to the right place. DNForum is the largest domain name community on the internet and continues to grow every day. There are over 105,000 domainers on DNForum doing everything from buying domains, selling domains, learning about domains and discussing domains. Take a minute and Register.
Register Today on DNForum IT'S FREE!Yes - Theo is correct. Everyone needs to remember that there are 100 million records being searched through. And the searches, for the most part, are pretty quick. That's because I have many, many tables in the database and point the search to the correct table based on what it starts with or what it ends with... (that's thousands of tables). To have it work with "contains" is extremely difficult on a dataset this large. It works on the dropping domains because the dataset is much smaller... on the active domains - probably not gonna happen.
Last edited by kengreenwood; 05-02-2009 at 11:05 PM.
Update - I've overhauled the entire zfbot process... I'm now just loading a single, VERY large and indexed, MySQL table as opposed to 1500 or so tables. The indexes take care of what the multiple tables were handling.
Since I'm no longer processing data all day long, I've changed the scroll from a status to the DNJournal RSS Feed
Still working on further functionality and updates...
http://www.zfbot.com
KJG
Great Service,
thank you!
NameDrives.com Leading The Next Boom
Finally got the scripts down so that I can cron everything up. The entire process of downloading the com, net and org zonefiles, cleaning up the data in unix, bulk loading it into the indexed MySQL tables and doing updates takes about 4 hours. Not bad for 100 million records.
Awesome upgrades. Any time now, it will also brew coffee![]()
All - you should see a marked improvement in the performance of the ZFBot application. I've added a bunch of indexes on the large zone file table (over 100 million records on that table) and the searches are coming back quicker. I've noted that any search that yields a large number of domains tends to never come back - and that large number of records value is dependent on the memory on your local machine. There is no limitation on the number of records the datagrid can hold - it's just the memory on your machine. For example, on my laptop the queries that yield over 150,000 domains never come back for me... it may be higher or lower for each of you. I'm working on adding paging functionality to the datagrid that displays the resulting domains so that I can avoid the local machine memory issue and return an unlimited number of domains. For now, keep this in mind and just narrow down your searches by using the options of domain length, excluding dashes, numbers etc... and you'll be fine.
On a side note - I apologize for being out of pocket for the last couple weeks - I just sold my house and moved to a new one... hired a moving company for the big stuff but I moved a lot of stuff myself ... every bone/muscle in my body is aching... including my (typing) fingers. ouch.
5/29/09 11:10 AM - optimizing the large zfbot table right now ... queries will run endlessly until it's complete.. sorry!
OK - everything is kosher again. I removed the auto-search on load and replaced it with a welcome swf...
check it out... http://www.ZFBot.com
Last edited by kengreenwood; 05-29-2009 at 03:18 PM. Reason: Automerged Doublepost
I added the ability to search for CVCV and CVVC or both within the "dropping domain" search. You'll see that the "Parking Breakdown" button turn into a dropdown when you check the "It's Dropping" checkbox. And if you select CVCV, CVVC or CVCV/CVVC, most of the options on the top will become disabled (the starts/end/contains dropdown, the domain text input box, the dash/number filters and the length filters). Once you uncheck either the "It's Dropping" checkbox or change the dropdown value from CVCV,CVVC or CVCV/CVVC, all of those filters will be re-enabled.
Somehow, the day that it *will* be making coffee too is near!![]()
by the way, the main table is being updated as i type this... queries will have to wait for a few minutes...
ok - all done... and Theo - it's gonna grind the coffee beans too ...
Might as well add a few more to the dropping domains filter...
For those of you who like looking for 4 letter domains - Now there is selection for cvcv, cvvc, vcvc, vccv and cvcv/cvvc/vcvc/vccv all together.![]()
Last edited by kengreenwood; 05-30-2009 at 10:03 PM. Reason: Automerged Doublepost
I hope adding more functions won't slow it down too much![]()
Like spltting it up into a-z
If I start seeing a significant performance drop off, I'll certainly remove some functionality. But for now, it's full speed ahead...
I'm almost done with the portfolio search - It's going to be a major upgrade and it will allow users to load up their entire portfolio of domains and get a complete result set of matches - begins with, ends with and exact.
Here are a couple of the screen shots - still a work in progress but getting close:
First column is the users domain (in this case, mine), second column is the matched domain, then you've got Google PageRank, number of Google indexed pages, number of Google backlinks, the Alexa rank, whether it's in DMOZ and the match type. All of the columns are sortable and I'm currently working on integrating the same filters that you use on the main search into this view. (In this view I happened to sort it by the Alexa rank). Every column except the first (the users domain) is related to the matched domain. This gives you a quick way to see any matched domains that are likely to have full blown websites associated with them versus parked sites.
This view shows you the domains that matched that are also about to drop. This can be helpful if you're looking to pick up and similar domains to yours... The date it's dropping becomes visible when you click the "dropping" checkbox in this view.
The process of getting the matches for a set of 300 of my domains took about 5 minutes so it'll be something that I'll give the users the ability to cue up and run on a regular basis.... and I'm going to add to the view a column that displays the last date updated on each domain so you can see any newly matched domains. Next steps beyond this are to send email alerts for specific saved queries.
If you want to submit functionality requests - just PM me.
I must say that you don't need to give Ken any ideas out of kidding; he will implement themSo now ZFBot makes coffee as well
The output is also impressive. It's an expandable Excel document that simplifies the human (visual) scanning of the results.
That's really impressive stuff!
How about giving users options of choosing to display or
no display items like PR, GIP, GBL, Alexa, DMOZ and Drop Date?
I am thinking it may speed up the processing speed if
users choose not to display some items and also reduce server load?
BTW, I prefer ESPRESSO!![]()
Yeah, the performance is going to be an issue with the stats I'm afraid. Here's the deal (and if anyone has any ideas for work arounds, I'm all ears)... To get the matches is no problem - takes a reasonable amount of time even on large data sets. I ran through a list of domains from one of of the domainers here of around 1,100 domains and it resulted in a matched list of 450,000 in around 10 minutes. But, getting the PR, GIP, GBL, Alexa, and DMOZ values for those domains takes around 5 seconds per domain (via a PHP script I am using). That may not sound like a lot of time but on a record set of 450,000 - it would take nearly an entire month to complete. That's because in this set of domains, there were a handful that each had well over 10,000 matches and a few had nearly 50,000 matches. Nice domain names to say the least. Even on a small set of data - the 200 domains I ran through produced around 10,000 matches and the processing time for the stats would be almost 6 days. Not acceptable obviously. I'm going to have to tinker around with which stats are most important and how I want to display them... it may need to be a button that you press on each domain and then it retrieves those stats. Unless I get some script that is able to run SIGNIFICANTLY faster. Something like processing 5 domains a second would be acceptable - that would result in the 450,000 matches being done in a day and a more reasonable set of around 50,000 in a couple hours. Anyway, I'll still be getting the portfolio match with the dropping domains completed regardless of the stats... stay tuned.
Just finished doing some research - technically, it's against Google's TOS to run an automated PR checker on your server. It takes up Google's bandwidth and they frown upon it. There are obviously tons of scripts out on the net you can find to get the PR - many of them paid for scripts and programs such as the domainpunch.com tool. However, you can probably get around the "automated" terminology because the user is technically kicking off the request... therefore it's manual. The process I'm currently running would probably be deemed automated since it's a single script going out and grabbing tens of thousands of PR's. Sooooo, I'll probably go the route of getting the domain matches for the portfolio and allowing a user to click the domain to get the PR, GIP, GBL, etc... once the data has been requested, I can store it locally and display it from that point forward without having to recall it through Google. There would need to be a "refresh" button the user can click to update the data again at some point...
Last edited by kengreenwood; 06-09-2009 at 09:38 PM. Reason: Automerged Doublepost
I added a chat room on ZFbot.com ... you'll see a radio button to access it above the stats grid on the right...
Chat room?
Where is my espresso :(
I made some changes to the zfbot indexes and query code... domain searches are coming back MUCH more quickly now.
Some minor updates and issues:
1. I added the ability to search for CVCVC and VCVCV dropping domains.
2. I added code to restrict all special characters in the search (~`!@#$%^&*()_+=[]{}|\:;"',.<>/? and space)
3. Nearly done with the registration and portfolio upload/match. Just need to create the form to allow users to upload and then maintain their domains. Once domains are uploaded, there is a script that runs every half hour that will check for matching domains for any domain that hasn't yet been matched. All already matched domains will be re-checked for changes nightly. Users will have the ability to see matching domain changes over the last 24 hours, 7 days and 30 days or can just pull up all matching domains. The nice thing about having the ability to see changes to matches is that you can see any NEW matches or any matches that DROPPED (meaning that there was a matching domain but that the matching domain has dropped - this could because the domain is actually dropping or because the name server was removed from the domain temporarily). The current columns in the matching domains datagrid in the portfolio view are: The 3 image links for website, archive.org and snapnames backorder, your domain name, the matching domain name, the type of match (begin, end or exact), the change type (New Match, or Match Dropped) and the last column is whether or not the domain is actually dropping (yes/no).
The matches can be exported to Excel and I've got a macro that I can send users that you just click ctrl-x once the spreadsheet opens and it cleans up and formats the imported data with html links and subtotals, etc.
Here's a screenshot of the portfolio matches form (this example just shows changes over the last 24 hours):
Issues I've found in testing so far:
1. The deeplinking isn't working for some reason in Firefox and IE. It's working in Safari and Chrome.
2. I've noticed a weird issue with the search box in Chrome (and it may be related to the deep linking) where you end up not being able to remove or add characters sporatically...
3. If any tables are locked due to updates running, some of the browsers return an error pop up... not a big deal ...
If anyone finds any other issues, please let me know about them...
Tuned up the queries even more... quicker and quicker now. And I added logic within the query and app that fires off an error when the query results exceed 150,000 records. You'll see a message that says "Over 150,000 results match this query... " etc...
Last edited by kengreenwood; 06-26-2009 at 11:26 PM. Reason: Automerged Doublepost
Bookmarks