Dave Hoernig

Director of Software Engineering

What is a Canonical URL and Why Should I Care?

A canonical URL or “canonical link” is an HTML element that helps search engines avoid the appearance of duplicate content. It does this by identifying a preferred version of a web page. Using canonical URLs improves your site’s SEO and makes searching the site easier for your visitors. The canonical link appears in the head section of a web page and looks like this:

<link rel=”canonical” href=”http://www.yoursite.com/page-path/page-title/” />

How it works

Imagine you’re throwing a party at your home and you provide directions to your guests. (I recognize that nowadays people will just plug your address into their navigator, but my father refuses to use such technology and still prefers written directions and paper maps.) Knowing that your guests will be coming from different starting points, you provide a different set of directions whether they are coming from the north, east, south, or west. Each set of directions presents a differt route, but each ends up at your house.

Now consider that you publish a news story to your website, and your website allows your visitors different paths to get to news stories. One path may be to navigate to a menu choice “News” and click the link to your story. Another might be to click a link from a section titled “Latest News” on your home page. A third might be that your visitor navigated to some other page and saw the link to your news story in a side bar of related content. This could result in three different URLs:

No matter how visitors navigate to your news story, they will end up reading the same content, even if the URL and the appearance of the web page around the storyare different based on how they got there. Likewise, the different directions you offer your party guests will result in them all arriving at your home regardless of which route they took. The directions you provided your guests are like your web pages and your home address is like the canonical URL! There are different ways to get there, but only one home. Following through with the news story example, each of the pages above should have the same canonical URL. It might look like this:

<link rel=”canonical” href=”http://www.yoursite.com/news/archives/story-title/” />

Search engines crawl through links on your site just like humans only [very much] faster. That means that Google will find all three paths to your news story just as visitors will. Should it show all three results? No, instead when it sees the canonical URL – common to all three pages – Google presents that one. In doing so, Google avoids the appearance of duplicate content and your website visitors are not confused by multiple links to the same story. That’s why canonical URLs are important.

Historical footnote

The canonical link element was introduced in 2009 by consensus among the major search engines Google, Yahoo! and Bing. It was formally added as an HTML standard in 2012 and is now an expected feature of all modern content management systems.

 

Dave Hoernig

Director of Software Engineering

Farewell Google Site Search, Hello Google Custom Search Engine

Google wouldn’t be Google if it wasn’t shaking things up with its products and offerings. The latest shake up? Sunsetting the Google Site Search.

As you may have heard, over the course of the next year, Google Site Search will be discontinued, leaving in place Google’s Custom Search Engine (CSE), which will continue to be ad-supported. As of April 1, 2017 Google has stopped selling licenses and renewals for the Google Site Search, and will completely phase it out by April 1, 2018.

What are the differences between the old Site Search and the Custom Search Engine? The biggest, notable differences are that:

  • Ads are required. Google will, however, make exceptions for 501(c)(3) organizations.
  • Google branding is required with the new search version, and cannot be disabled, even for 501(c)(3) organizations.
  • There are monthly search query limits, so if you are running a high-traffic website there is a chance that the search will stop working once you hit your limit.

Wondering what this means for your organization and your website’s site search if you are a Google Site Search user?

Nothing, until your current Google Site Search license expires. You will continue to have access to the Google Site Search and your implementation and settings will stay the same until your license expires. At that time, Google will automatically convert your site search to the ads-supported CSE version and the changes mentioned above will take effect.

If you are a 501(c)(3), are okay with the Google branding on your site search, and have a relatively low site search usage on your website, the transition to CSE should continue to meet your needs. Once you are converted, you will simply need to disable the ads and should also be prepared to provide Google’s legal team with proof of your 501(c)(3) status, if requested. Pretty simple.

If you’re concerned that Google CSE won’t meet your needs, always keep in mind that there are other options on the market. For example, we’ve implemented the Searchblox and Solr site searches for our clients with excellent results. In fact, I recently spoke with CEO Joanna Pineda about why we love the SearchBlox site search so much. If you’re interested in what other options are available to you, please reach out! We’d love to work with you to find the perfect solution for you organization.

Have you been switched to the Custom Search Engine yet? What are your thoughts?

Sherrie Bakshi

Director of Marketing and Social Media

Google’s New Mobile-Friendly Algorithm Rolls Out Today

Google rolled out its NEW mobile-friendly algorithm today.

Why is Google doing this? The world is mobile. With an estimated 63% of American adults using their phones to go online (Pew Research Center), Google, the world’s largest search engine, wants companies to get serious about their mobile strategies. Internet search concept

What does this mean? When searching on your devices, Google will highlight mobile-friendly in the search engines. More importantly, mobile-friendly websites will rank higher in mobile searches.

How does this impact your organization’s web strategy? For one thing, if you don’t have a mobile strategy, it’s time to get serious about one. It’s also a good time for those with mobile strategies to assess them.

Here’s how:

  • Start with your own analytics. When reviewing your analytics, go to Audiences>Mobile>Overview. You can see from what type of device visitors are accessing your website. To gather a sense of whether or not traffic has climbed among mobile devices over a specific period, simply select the “compare to” in the calendar box (top right hand corner) and adjust dates. You can customize the dates based on specific periods.
  • Use the mobile-friendly test tool. This tool crawls individual pages on your website and lets you know if a page is not mobile-friendly and why.
  • Make your current website responsive. These days many content management systems, including Sitefinity and WordPress, offer responsive templates to help web developers convert desktop sites to responsive.

Finally, if you haven’t redesigned your website in a few years, make sure that you budget for a responsive website.

Liz Norton

Programmer

Using Google’s PageRank Algorithm to Detect Cancer

Using algorithms to determine relevant related content by ranking content and linking? Pretty brilliant. Using those same algorithms to predict cancer growth? Really brilliant.microscope

Researchers in Switzerland and Germany recently published an article in Public Library of Science Computational Biology about using Google’s PageRank algorithm to predict cancer growth: Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes.  (Now, in the interest of full disclosure, I have to admit that I didn’t read the whole paper in PLoS Computational Biology.  I did, however, read a fascinating review and digest by on the paper on Txchnologist: Googling Cancer: Search Algorithms Can Scan Disease for Patient Risk.)

Continue reading

Wesley Harris

Software Tester

Troubleshooting Google Analytics Tracking Code: There’s a Chrome Extension for That

ga_debug.js is a pretty rad tool for troubleshooting the Google Analytics tracking code in your development or testing environment. But Google slaps this big fat caveat onto the documentation:

Important: You should not modify your production site to use this version of the JavaScript. The ga_debug.js script is larger than the ga.js tracking code and it is not typically cached. So, using it in across your production site will slow down your site for all of your users. Again, this is only for your own testing purposes.

A key aspect of our quality practice at Matrix Group is some final testing in our production environment (non-destructive, duh). Developers and SysOps will SWEAR that a deployment was flawless, but I don’t believe it until I see it (sorry fellas, I’m a skeptic).

Browsing the site and waiting for analytics data to show up is a suboptimal solution to this dilemma. The data won’t appear immediately, and at Matrix, we don’t track traffic from our IP range anyway.

So how do we verify that Google Analytics tracking is working properly on the live site? Enter the Google Analytics Tracking Code Debugger (*trumpets*).

This extension for Google Chrome (we’ll call it GA Debug) enables ga_debug.js without having to serve it from your production environment.

Installation is simple. Using Google Chrome, navigate to the extension’s entry in the Chrome Web Store and click Add To Chrome.

Now you’ve got the GA Debug button in your Chrome toolbar. Congratulations.

Click it to enable ga_debug.js. Open up the web developer tools (on my Mac it’s View > Developer > Developer Tools), and click the Console tab to get the javascript console.

Let’s see what they’re tracking over at I Can Has Cheezburger:

debug sample

Click to embiggen

The extension has very helpfully parsed out and decoded the otherwise nigh-inscrutable parameters to the GET request to _utm.gif, which is the actual tracking beacon. Looks like they are tracking a custom variable PageType, which in this case has the value Index. Hopefully that’s what they were expecting.

Grrr, but now we have another problem. For one of our clients, we do some fine-grained testing of page layouts, which in practice means tracking clicks on items in sidebars and other content blocks as events. If those clicks open a new page (ours do), you’ll need to be Mr. Speedy to read the message on the console before the new page clobbers it.

The solution is to persist the console log on navigation, which helpfully is an option in the developer tools settings. On my Mac, I found these settings by clicking the little gear in the bottom-right corner of the developer tools. In the Console section, tick the box for “Preserve log upon navigation.”

If you aren’t using Google Chrome, WHY AREN’T YOU USING GOOGLE CHROME??? OK fine; looks like there’s a Firefox extension to accomplish something similar.