6/25/07 UPDATE: I am obligated to point out that this little script has graduated from interesting to useless — thanks to the new Google Analytics, which is hands down the best tool for understanding web traffic. And it’s bloody free. You probably knew this already. But, just in case, here’s a great new tutorial. That is all.
It is a pretty basic trick to get an idea of people that are linking to your site. Just google:
link:http://mysite.com
But that is an extremely rudimentary technique for several reasons.
- You will probably get a bunch of *internal* links, which are pretty useless.
- You will not get a sense of the total number of links from each referrer  they are not tallied or ranked.
- You only get referrers for the individual page you type in, not your entire site. Which means that you are getting largely underreported numbers. (even http://www.yoursite.com is different from http://yoursite.com)
You asked, and we listened: We’ve extended our support for querying links to your site to much beyond the link: operator you might have used in the past. Now you can use webmaster tools to view a much larger sample of links to pages on your site that we found on the web. Unlike the link: operator, this data is much more comprehensive and can be classified, filtered, and downloaded. All you need to do is verify site ownership to see this information. Peeyush,
Google Webmaster Central Blog
So yesterday I was super happy to discover via the trusty Google Webmaster Central Blog that there is a new “links view” in the Webmaster Toolkit.
The Webmaster Toolkit is a service from Google that you really should be using. It takes just a few minutes to get started and then you get lots of data, including the new link data. If you haven’t already (and, uh, you run a website), check it out and you will be happy to pick up a bunch of free statistics about your site. Notably, you can also create an XML sitemap (not a graphical HTML sitemap, though!) of your site to make sure google is indexing the whole thing. And you can test your robots.txt file (important for keeping those pictures of the last drunken staff party out of images.google.com).
I did have a couple of problems with the data, though — there still is no way to get a good ranking of your referrers, or a ranking of your most popular pages. Luckily, you can download the entire file and do whatever you want with it. (hooray for openness!)
Since we have a bunch of clients I wanted to send this new data, I took the time to write a simple perl script. And I figured a few other people could use it.
It’s here:
[syntax,unique_addresses.pl.txt,perl]
Instructions for unique_addresses.pl
Prerequisites: Using this script requires that you know how to execute file from the command line (and that you have perl installed). This will only work for Mac/Linux folks (requires perl and the *nix commands for sorting). … If you are a progressive blogger or organization and can’t get this to work, email me your stats and I will process them for you.
- Download your entire external links file from the webmaster toolkit.
- Use Excel or something to pull out that column of external links, and save this as something like “referrers.txt”
- Repeat the above for your “pages” column, but name it something like “pages.txt”
- Download script.
- Make it excutable in the same directory as your “pages.txt” and “referrers.txt” files
- Run “./unique_addresses.pl” and it will prompt you through the rest.
Again, if you are working for a good cause but run into trouble, just email chris at blast dot com or leave a comment.
Finally some useful help from the “celebrity engineer” Matt Cutts, one of the few people in the world that has had intimate relations with the Google Pagerank algorithm. (EDIT: He also happens to use Wordpress, not Blogger. Hmmm.)
This is a description of how to best reference your urls in order to ensure that Google understands them clearly. (the corresponding clarity, is designed — clearly — to increase your rank.)
From the blog of Cutts:
Q: What is a canonical url? Do you have to use such a weird word, anyway?
A: Sorry that it’ a strange word; that’s what we call it around Google. Canonicalization is the process of picking the best url when there are several choices, and it usually refers to home pages. For example, most people would consider these the same urls:
* www.example.com
* www.example.com/
* example.com
* example.com/
* www.example.com/index.html
* example.com/home.asp
Roger Johannsen at 456 Berea St. (web design) says he adds
1. RewriteCond %{HTTP_HOST} ^456bereastreet\.com [NC]
2. RewriteRule ^(.*) http://www.456bereastreet.com/$1 [R=301,L]
to his htaccess file in order to ensure that “www” is added to all requests for a page on his server. Great work. Great post. Thanks, Matt and Roger.
Google has a new librarian’s newsletter that offers a nice clear (and brief) look at how their rankings work.
Nice to see a little bit of transparency from the behemoth Google, which is known to be more than a little secretive about its algorithms.
This doesn’t actually, clear things up entirely, but it is Google’s clearest, simplest explanation yet of how things work.
Being visible on the web is essential for anyone with good information to share. If you’re a nonprofit, activist, artist, or otherwise a force for good, you should know how to make your website prominent.
From the introduction:
One of the most common questions we hear from librarians is “How does Google decide what result goes at the top of the list?” Here, from quality engineer Matt Cutts, is a quick primer on how we crawl and index the web and then rank search results. Matt also suggests exercises school librarians can do to help students.
Read the newsletter: Google Librarian Center
This kind of stuff is (Search Engine Optimization, or SEO) is relevant to every website, but I think that nonprofits and do-gooders can use it the most. Here’s a chance to brush-up, as the rules change slightly every day.
Yaro Starak has written an in-depth review of The 8 Essential Things You REALLY Need to Know About Search Engine Optimization, a CD where Brad Fallon talks about ways of improving search engine rankings. The review is split into two parts: The 80/20 Of Search Engine Marketing - Part 1, which contains on-page SEO techniques, and The 80/20 Of Search Engine Marketing - Part 2, where off-page techniques are discussed.
The point is this:
- Title Tags
- Keyword Density
- Site Structure
- Internal Links
- Links and PageRank
- Page Reputation
- Anchor Text
- Link Popularity
Via: 8 essential search engine marketing techniques | 456 Berea Street
There is no shortage of webmasters desperate to get their hard-won site noticed. After spending many sleepless nights coding and debugging a site for a nonprofit, it only makes sense that you would want it to actually show up when someone is searching for your organization’s keywords. But, like much of the web-authoring career, you won’t find a perfectly simple solution.
The science and art of ranking highly on Google has become a major industry, populated (not surprisingly) by some folks willing to do some very unsavory things to get your site listed. Here’s how it works: you make the site and want to see it ranking higher; you pay a “Search Engine Optimization” company to get you listed on the first page of Google for certain keywords. What happens after this point will either 1.) entail rewriting most of your content and all of your title/meta/alt tags or 2.) entail the SEO company setting up false domains that feed into your site (or a number of other dirty tricks). Both will get you higher listings, but only rewriting your content (a major project in most cases) will keep you there. Some unsavory SEO techniques will actually get you banned from Google et al, making your site effectively useless. This was a major point of discussion on a recent thread at techsoup.org, which is highly recommended in general. Here’s a snippet of the conversation:
While there’s a lot you can do to improve your site’s ranking, nobody in the world can guarantee that you’ll come up with a top ranking on the first or second page of the search results. A statement like this implies that humans have complete control over the search engine ranking process, when in fact this is not true at all. However … SEO can lead to excellent results without it having to be a complicated process.
You can find the thread under the “web building” forum, but you’ll have to browse for the search engine-related threads. The above conversation touches on some very good, basic techniques of search engine optimization, such as the use of the title tags and keyword-rich copy. There are other guides available online for sure, but the best resource I’ve found is the 2004 Search Engine Optimization for Dummies. If you’re new to search engine optimization, just sit down with that and apply a little of what you learn to your site. Even just an afternoon’s work can make your site several times more visible on search engines and result in greatly increased traffic on the web. If you’re really willing to work at it, you can get an incredible amount of publicity for your organization through your website.