SlideShare a Scribd company logo
Sreekanth NarayananSEO and Analytics
SEO and AnalyticsSEO IntroductionAnalytics IntroductionSearch Engine basicsAnalytics – methods Technology ConsiderationsTools for AnalyticsTweaking your ContentSome Key TerminologiesPromoting Web PagesTools for Web Masters
SEO and AnalyticsSEO Introduction
SEO – what’s that???Search Engine Optimization has been a buzz word since the advent of major search enginesSEO deals with best practices outlined to make it easier for search engines to crawl, index and understand the content on your web page.
SEO and AnalyticsSearch Engine basics
How do search engines work?Spiders (Also called Robots) comb the web by following links Search engine formats the data is finds and stores in its database.All the search engines maintain extensive and highly indexed databases.
SEO – what’s that???All trademarks belong to respective owners
Indexing of the results is based on complex algorithms based on a number of complex parameters.Due to the years of expertise gained by Web masters in analyzing the behaviors of the major Search Engines, there is a considerable knowledgebase on what makes pages more Search Engine Friendly.SEO – what’s that???
Paid and Organic Search ResultsMany Search engines have launched paid services like the Google Ad WordsThe Organic Search results are the ones which are not influences by paid or sponsored programsSEO applies to the organic results. It normally has no impact on the results shown from sponsored links.
SEO and AnalyticsTechnology Considerations
User-Agent HTTP HeaderMost web sites heavily make use of the UserAgent HTTP header to determine who the requestor of the page is. Often the Web sites behavior is altered depending on what is passed on the user agent field. Typical applications of this is changing the CSS for IE and Firefox - The (in)famous browser incompatibility issuesForwarding a user to a Mobile version of the Web Site if the user agent happens to be a Mobile Device.
The common Robot user agentsThe following are the most famous Robot user agent strings
CloakingCloaking has been a very popular methodology used in the earlier days for SEOIt is a simple way disguising your website in to another text based (with a lot of keywords sprinkled all over) web site when a request is coming from a Web Robot (Spider). Most Spiders are identifiable by their User Agent headers. For e.g. the Google Robot is called the “Googlebot”As search engines strengthened their spam detection technologies, they often started penalizing “Cloaked” web sites by removing them  altogether from their indices.As of today, cloaking is not considered a recommended practice and should be avoided in all scenarios.
URL StructureSimple-to-understand URLs will convey content information easilyIt is easier for the user as well as the crawlers to organize. Crawlers typically try to reduce priority of indexes of urls containing arbitrary numbers and characters.PageRank  (TM – Google Inc.) algorithm gives a lot of weightage to the number of pages which link to your page. If your URLs are simpler it is easier for users to link your page.If your URL contains relevant words, this provides users and search engines with more information about the page than an ID or oddly named parameter would
URL best practicesAvoid using lengthy URLs with unnecessary parameters and session IDs Avoid choosing generic page names like "page1.html" Keep the directory nesting as simple as possible Keep the directory names relevant to the content provided in the directory. Avoid using numbers for directory namesDo not mix up capital case in urls – like CreateOrder.html? – Users always prefer a single case (and lower case always)
URL best practicesWeb sites should be as flat as possible, with content relating to highly competitive keywords implemented on pages high on the hierarchy.Rewrite URLs on the Server side to make them simpler and less nested.Note that Search engines always assign a lower relevance score to data which is found deep nested inside the Website. The Content on the top folders are considered much more relevant.
Canonical URLMore than often, there are multiple ways to reach a same page on a Website.Canonicalization is the process of picking the best URL when there are several choices, usually referring to the homepage of a website.For e.g. consider http://www.google.com and http://google.com. Both URLs provide same content. Another example of this is “domain.com/aboutus.htm” and “blog.domain.com/aboutus.htm”More than often search engines are intelligent enough to recognize that the content on the pages is the same, and they would pick one of the URLs, which might not be out preferred one.
Canonical URL – best practicesThere are a few ways to ensure that the proper URL is indexed:When linking to your homepage always point to the same URLWhen requesting links from other sites, always point to the same URLRedirect the non‐www homepage to the www version of the homepage, use 301 Permanent redirects. A 301 redirect example (JSP) is shown below.<%response.setStatus(301);response.setHeader( "Location", "http://www.new-url.com/" );response.setHeader( "Connection", "close" );%>
HTTP 301 &HTTP 302302 is a temporary redirect 301 is the permanent redirectAs far as possible use only 301 for redirection. (Explained on previous slide)Always redirect from the server (Sample on previous slide)302 redirects indicate that the content is temporary and will be changed in the near future. Popularity attained by the previous site or page will not be passed on to the new site.301 Permanent Redirects should be used when the change is long‐term or permanent, which allows Page Rank and link popularity to transfer. This is taken care by the indexing engines of all major search engines.
Name Value pairs in URLsName Value pairs are used on urls to provide information necessary to produce dynamic content. Urls tend to become lengthy with name value pairsThey contain numbers which are typically treated as junk by Search engines. Further “prod_code” does not make any sense to a common user. A Product name would have been betterUse valuable keywords in the name‐value pairs whenever possible and keep the quantity of pairs to no more than three.
User Input Fronting ScreensMany sites have a front page where you need to enter your location or your details before it could give you information about products.Search engines cannot input information, or make selections from form drop downs. This means search engine spiders are effectively locked out of relevant content and cannot index or rank the content.Another problem is having a splash screen with a country chooser which does not allow people to go beyond that page without selecting the country to choose the locale. It is better to have a default locale and go inside and then give an option to change it. The Robot will be able to index your pages with such a design.
Using mostly text for navigationLot of sites use flash or JavaScript to do navigations. Search engine spiders are unable to follow Java Script or Flash navigation and are therefore unable to find pages accessible only through Java Script or Flash navigation. Flash might not be supported on all browsers. User might not have installed the plug-in or could have disabled JavaScript.Only use HTML based navigationYou might have seen that most web 2.0 sites include a full sitemap on the footer. This is done to make sure that all the flash/script navigation links are replicated in HTML form for the spiders to make use of.
The Web 2.0 footerPage copyright mint.com
Provide alternative to flash contentSpiders cannot read flash content All links embedded in flash is never navigated or indexed If you cannot do away with flash due to usability reasons, implement a site with the same links in HTMLImplement user‐agent detection to deliver the HTML site to spiders and the Flash version to human visitors.
Excessive In page ScriptingAll Web crawlers limit the amount of content they index from a pageTypically this is limited to 100 KB of data.If you have too much in-page scripting, the only thing the search engine might see is the script on your pageSome of the content on your page will be ignored if the limit is reached. Crawlers ignore the <script> tag, but the total content read (100KB) includes the scripts as well.It is always sensible to have your scripts on a different  file and included on to your page. This way, you are not risking running out of the crawlers content limitations and still write a lot of code for dynamic behavior.
Excessive In page ScriptingFollowing example shows the right way of doing this<link href="${ctx}/content/css/style.css" rel="stylesheet" type="text/css" /><script type="text/javascript" src="${ctx}/js/jquery-1.4.2.js"></script><script type="text/javascript" src="${ctx}/js/jquery.ui.core.js"></script><script type="text/javascript" src="${ctx}/js/jquery.dataTables.js"></script><script type="text/javascript" src="${ctx}/js/highcharts.js"></script><script type="text/javascript" src="${ctx}/content/js/page.mypage.js"></script>function setDefaults(){    $('#genericError').hide();    $("#catgErr").hide();    $("#allCatgs").attr('checked', false);    $.ajax({url:"../callsomething",        type : "POST",async:false,success:function(data){varlen = data.map.entry.length;for (i =0 ; i < len; i++)        {	//do something        }       }}    );                                  }
Session Ids on the URLA web server assigns a unique session ID variable within the URL for each visit for tracking purposes.Search engine spiders revisiting a URL will be assigned a different session ID each visit, which will result in each visit to a page appearing as a unique URL and causing indexing inconsistencies, and possibly duplicate content penalties.Should implement user‐agent detection to remove the session ID’s for search engine visits.
“nofollow” settingsSetting the value of the "rel" attribute of a link to "nofollow" will tell search engine robots that certain links on your site shouldn't be followed or pass your page's reputation to the pages linked toVery true for all the pages which allow user comments. Say you a famous company and allow people to post feedback on your blog. Always set the “nofollow” to avoid the scenario like the following !Sample : <a href="http://www.cheapdrugs123.com" rel="nofollow">Comment by a spammer</a>
404 pagesPages or content that is moved, removed, or changed can result in errors, such as a 404 Page Not Found.Having a custom 404 page that kindly guides users back to a working page on your site can greatly improve a user's experience Your 404 page should probably have a link back to your root page and could also provide links to popular or related content on your site. NEVER EVER allow your 404 pages to be indexed in search engines Do not use a design for your 404 pages that isn't consistent with the rest of your site Repair all broken links as soon as possible
SEO and AnalyticsTweaking your Content
The <title> tagMost Search Engines give a lot of weightage to what is the content in the <title> HTML tagA title tag tells both users and search engines what the topic of a particular page is.The <title> tag should be placed within the <head> tag of the HTML document Ideally, you should create a unique title for each page on your site.
<title> tag tipsAlways put a sensible title for every page. Do not repeat the text in all the pages or a group of pages unless it makes sense .Make sure all your important business are reflected on the titleNever choose a title that has no relation to the content on the pageNever use default or vague titles like "Untitled" or "New Page 1“
<title> tag tipsAlways put a sensible title for every page. Do not repeat the text in all the pages or a group of pages unless it makes sense .Make sure all your important business are reflected on the titleNever choose a title that has no relation to the content on the pageNever use default or vague titles like "Untitled" or "New Page 1“Google displays 63 characters from the page title on the search results, which means the first 63 characters should contain all relevant detail you needed.
<meta> tagsA page's description meta tag gives search engines a summary of what the page is aboutLimit descriptions to 250 characters•Include all targeted key phrases•Copy should be written with users in mind (description copy appears in search results)•Create a unique meta description for every page
<meta> keywords tagKeywords are mentioned in the head section of the html.Google gives very little importance to this Bing and Yahoo searches give some importance to this (Still makes sense to specify this).The search engine normally does not display these content in the search results.Use only relevant phrases on this tag. Use distinct phrases for the pages.
Header tags <h1>, <h2>, <h3> A lot of importance is given  by the Search engines to what content appears inside the header tags. Strictly one <h1> tag per page. This should be used for the most important heading on the page.<h2> and <h3> tags also should be used for the most relevant headings Always keep the natural hierarchy. First h1, second h2 and then h3.
Importance of Anchor textAnchor text is the clickable text that users will see as a result of a link, and is placed within the anchor tag <a href="..."></a>. e.g. <a href="http://www.mydomain.com/articles/our-prices.htm">Lowest prices on earth for international calls</a> This text tells search engines something about the page you're linking to.Avoid writing generic anchor text like "page", "article", or "click here" Avoid using text that is off-topic or has no relation to the content of the page linked to Avoid using CSS or text styling that make links look just like regular text
Duplication of ContentDuplicate content exists when two or more pages within a website, or on different domains, share identical content. Different domain names do not create distinct content. company.com/aboutus.html blog.company.com/aboutus.htmlMajor search engines consider duplicate content to be spam and are continually improving their spam filtering process to penalize and remove offenders.Avoid duplication of content as far as possibleUse 301 permanent redirects to inform search engines of the proper URL to utilize.
Optimizing image contentImages form an integral part of any websiteThe "alt" attribute allows you to specify alternative text for the image if it cannot be displayed for some reasonThis is a very important usability aspect as the “screen reader” program used by blind people will identify and read out the alt text for them.Another reason is that if you're using an image as a link, the alt text for that image will be treated similarly to the anchor text of a text link.Optimizing your image filenames and alt text makes it easier for image search projects like Google Image Search to better understand and rank the images on your website.
The robots.txt fileWeb site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.A sample can be seen here : http://www.robotstxt.org/robotstxt.htmlAll major search engine robots scan this file to see what pages are relevant to be crawled. The Disallow tags specify which pages should be ignored by the crawler.The robots.txt typically has the such informationDisallow: /residential/customerService/ Disallow: /residential/customerService/contacts.html Disallow:/residential/customerService/contactus/billing.html
The robots.txt fileThere are some important considerations when using /robots.txt: Robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention. The /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use. You could put all the files you don't want robots to visit in a separate sub directory, make that directory un-list-able on the web (by configuring your server), then place your files in there, and list only the directory name in the /robots.txt. Now an ill-willed robot won't traverse that directory unless you put a direct link on the web to one of your files, and then it's not /robots.txt fault.
SEO and AnalyticsPromoting Web Pages
Linking your websitesInternal linking between pages within a web site, such as navigational elements or a site map, plays an important role in how search engines perceive the relevancy and theme of both web pages. Proper intra‐site linking will help facilitate effective spidering, in addition to increasing relevancy of pagesMaintain a sitemap.Keep sitemap pages to less than 100 links per page Sitemaps should be linked directly from homepage and other major pages throughout the web site
Promotion through external channelsEffectively promoting your new content will lead to faster discovery by those who are interested in the same subjectIncreasing back-linking to your site is one option, but it should be done properly. Social Media site (e.g. the facebook like) adds to your link count.  Typically it is not advised to link every small update in this fashion, as search engines now-a-days even understand those patterns.You could include your updates to a RSS feed. You could link it from Blogs of people in the related community.Search engines of today, do not only go by page rank for determining the relevance. It also depends on traffic and content.
SEO and AnalyticsTools for Web Masters
Webmaster toolsEvery major search engine has launched their own set of Web master toolsGoogle: http://www.google.com/webmasters/Yahoo: http://siteexplorer.search.yahoo.com/Bing: http://www.bing.com/toolbox/webmasters/ We will examine some of the most important tools which Google provides.
Webmaster toolsGoogle provides the following services:see which parts of a site Googlebot had problems crawling notify Google of an XML Sitemap file analyze and generate robots.txt files remove URLs already crawled by Googlebot specify your preferred domain identify issues with title and description meta tags understand the top searches used to reach a site get a glimpse at how Googlebot sees pages remove unwanted site links that Google may use in results receive notification of quality guideline violations and request a site reconsideration
SEO and AnalyticsAnalytics Introduction
Web Analytics - IntroductionWeb analytics is the measurement, collection, analysis and reporting of internet data for purposes of understanding and optimizing web usage.It is a very important tool for Business and market researchWeb analytics provides data on the number of visitors, page views, etc. to gauge the traffic and popularity trends which helps doing the market research.Predominantly 2 TypesOff-siteOn-site
Web Analytics - IntroductionOff-site web analytics refers to web measurement and analysis regardless of whether you own or maintain a website. It includes the measurement of a website's potential audience (opportunity), share of voice (visibility), and buzz (comments) that is happening on the Internet as a wholeOn-site web analytics measure a visitor's journey once on your website. This includes its drivers and conversions; for example, which pages encourage people to make a purchase. On-site web analytics measures the performance of your website in a commercial context.
SEO and AnalyticsAnalytics – methods
Methods for measuringLog file analysisAll Web servers record most of their transactions in a log file. (Access log for Apache)Was the most prominent method when the web evolved in late 90s.This involved running a tool to identify the hits to a page from the log file and determine statistics from the sameBecame very inaccurate in later times as there are a thousands of “non-human” actors on the web today. Googlebot is an exampleLog File analysis also failed when users enabled their browser caches. This resulted in pages being cached on the browser and when the user requested for the same pages, no hit was made on to the Web server.
Methods for measuringLog file analysisAll Web servers record most of their transactions in a log file. (Access log for Apache)Was the most prominent method when the web evolved in late 90s.This involved running a tool to identify the hits to a page from the log file and determine statistics from the sameBecame very inaccurate in later times as there are a thousands of “non-human” actors on the web today. Googlebot is an example
Methods for measuringLog file analysis – contd..The tools adapted to the robots by measuring the hits based on cookie tracking and ignoring the known robotsThis is not practical as robots are not only written by search engines, but also by spammers Log File analysis also failed when users enabled their browser caches. This resulted in pages being cached on the browser and when the user requested for the same pages again, no hit was made on to the Web server and content was delivered from the cache.
Methods for measuringPage taggingDeveloped during later stages of the web Embeds a Java Script code segment on the pageWhen a tracking operation is triggered, data from the HTTP Request, browser/system info and cookies are collected by the ScriptThe Script submits the data as parameters attached to a image request sent to the analytics server. (Single pixel image)For e.g. take a look at the Google analytics data collection request which gets sent out.http://www.google-analytics.com/__utm.gif?utmwv=4&utmn=769876874&utmhn=example.com&utmcs=ISO-8859-1&utmsr=1280x1024&utmsc=32-bit&utmul=en-us&utmje=1&utmfl=9.0%20%20r115&utmcn=1&utmdt=GATC012%20setting%20variables&utmhid=2059107202&utmr=0&utmp=/auto/GATC012.html?utm_source=www.gatc012.org&utm_campaign=campaign+gatc012&utm_term=keywords+gatc012& …..etc…..
Methods for measuringPage tagging contd..After the invent of the XHR (XmlHttpRequest) some of the page tagging scripts have used a AJAX submission of user data on to the collection server. This is often bound to fail due to restrictions on the XHR (Domain of Origin) on most of the modern browsers.As the page tagging approach Involves downloading a one pixel image from a domain (like Google) this adds an additional DNS (Domain Name System) lookup to your page which is sometimes looked upon as obstructive to page loading.
Page tagging is the new AnalyticsPage tagging is the de-facto standard followed as of todayIt has a significant advantage that it works even for pages hosted on the cloud, meaning that you do not need to have dedicated web servers and monitor their logsAnalytics today is mostly an outsourced service. There are many specialist providers like Google and Adobe. And page tagging is the only method supported there.
SEO and AnalyticsTools for Analytics
Major tools – Web AnalyticsGoogle AnalyticsFree from Google (5M page view cap per month for non AdWords advertisers.)Uses Page Tagging as Analytics MethodUser embeds a Script in to the page The Script collects information on the page actions and submits the same to the Analytics Server by using the data as parameters on an image fetchDetailed reports are presented to the user by logging into your Google account
Google Analytics Results
Major tools – Web AnalyticsOmniture Fusion (Adobe)Uses page tagging for information collectionYou include a Script snippet on all the pages which are tracked. The information is submitted through Script call, almost same as what Google does, as parameters to  “1px x 1px” transparent image request<body><script language="javascript" src="INSERT-DOMAIN-AND-PATH-TO-CODE/s_code.js" type="text/javascript"></script><script language="javascript" type="text/javascript"></script></body></html>
Onmiture Reports
SEO and AnalyticsSome Key Terminologies
Web Analytics KPIsKPIs are those metrics which give information on what changes could drive more effectiveness on your websiteAll KPIs are metrics, but not all metrics are KPIs.In Web Analytics it becomes very critical to measure the right things.
First and Third Party CookiesFirst-party cookies are cookies that are associated with the host domain. Third-party cookies are cookies from any other domain. You go to the site http://yahoo.comThere is a banner ad on this site for http://youbuy.comBoth yahoo.com and youbuy.com place cookies on your browser So for you, the cookie from yahoo.com is a First Party cookie and the one from youbuy.com is a Third Party cookie.
First and Third Party CookiesSo if I had placed the Google analytics Script on our page http://mozvo.com,  and it had placed a cookie for the domain “google.com”, then that would have been a third party cookieThird party cookies are widely discouraged as there are quite a few sites which plant tracker cookies.A lot of users (about 40%) disable third party cookiesAll of the analytics providers have switched to using first party cookies to track information. Which means that the user will see only cookies from mozvo.com even though the Google analytics code is embedded on the page.
Bounce Rate and Click through rateThe Bounce Rate : The bounce rate for the homepage, or any other page through which visitors enter your site, tells you how many people 'bounce' away (leave) from your site after viewing one page.Hence having a low bounce rate is preferred.Click Through Rate : Click-through rate (or click-thru rate) tells you how many people are clicking through to your site from a third-party. For example from a link, search engine, banner, advertising or email campaign.A Higher Click Through rate is preferred.
Click Stream AnalysisClickstreams, also known as clickpaths, are the route that visitors choose when clicking or navigating through a site.A clickstream is a list of all the pages viewed by a visitor, presented in the order the pages were viewed, also defined as the ‘succession of mouse clicks’ that each visitor makes.A clickstream will show you when and where a person came in to a site, all the pages viewed, the time spent on each page, and when and where they left.The most obvious reason for examining clickstreams is to extract specific information about what people are doing on your site..
Referenceshttp://static.googleusercontent.com/external_content/untrusted_dlcp/www.google.com/en/us/webmasters/docs/search-engine-optimization-starter-guide.pdfhttp://www.bing.com/community/site_blogs/b/webmaster/archive/2009/09/03/search-engine-optimization-for-bing.aspxhttp://help.yahoo.com/l/us/yahoo/search/indexing/ranking-02.html;_ylt=AiB.kJ7SxMRMNktmvnsyomX.YHhGhttp://www.bivings.com/thelab/presentations/SEO_Basics.pdf
Thank You !Thank you !http://nsreekanth.blogspot.com/

More Related Content

Seo and analytics basics

  • 2. SEO and AnalyticsSEO IntroductionAnalytics IntroductionSearch Engine basicsAnalytics – methods Technology ConsiderationsTools for AnalyticsTweaking your ContentSome Key TerminologiesPromoting Web PagesTools for Web Masters
  • 3. SEO and AnalyticsSEO Introduction
  • 4. SEO – what’s that???Search Engine Optimization has been a buzz word since the advent of major search enginesSEO deals with best practices outlined to make it easier for search engines to crawl, index and understand the content on your web page.
  • 5. SEO and AnalyticsSearch Engine basics
  • 6. How do search engines work?Spiders (Also called Robots) comb the web by following links Search engine formats the data is finds and stores in its database.All the search engines maintain extensive and highly indexed databases.
  • 7. SEO – what’s that???All trademarks belong to respective owners
  • 8. Indexing of the results is based on complex algorithms based on a number of complex parameters.Due to the years of expertise gained by Web masters in analyzing the behaviors of the major Search Engines, there is a considerable knowledgebase on what makes pages more Search Engine Friendly.SEO – what’s that???
  • 9. Paid and Organic Search ResultsMany Search engines have launched paid services like the Google Ad WordsThe Organic Search results are the ones which are not influences by paid or sponsored programsSEO applies to the organic results. It normally has no impact on the results shown from sponsored links.
  • 11. User-Agent HTTP HeaderMost web sites heavily make use of the UserAgent HTTP header to determine who the requestor of the page is. Often the Web sites behavior is altered depending on what is passed on the user agent field. Typical applications of this is changing the CSS for IE and Firefox - The (in)famous browser incompatibility issuesForwarding a user to a Mobile version of the Web Site if the user agent happens to be a Mobile Device.
  • 12. The common Robot user agentsThe following are the most famous Robot user agent strings
  • 13. CloakingCloaking has been a very popular methodology used in the earlier days for SEOIt is a simple way disguising your website in to another text based (with a lot of keywords sprinkled all over) web site when a request is coming from a Web Robot (Spider). Most Spiders are identifiable by their User Agent headers. For e.g. the Google Robot is called the “Googlebot”As search engines strengthened their spam detection technologies, they often started penalizing “Cloaked” web sites by removing them altogether from their indices.As of today, cloaking is not considered a recommended practice and should be avoided in all scenarios.
  • 14. URL StructureSimple-to-understand URLs will convey content information easilyIt is easier for the user as well as the crawlers to organize. Crawlers typically try to reduce priority of indexes of urls containing arbitrary numbers and characters.PageRank (TM – Google Inc.) algorithm gives a lot of weightage to the number of pages which link to your page. If your URLs are simpler it is easier for users to link your page.If your URL contains relevant words, this provides users and search engines with more information about the page than an ID or oddly named parameter would
  • 15. URL best practicesAvoid using lengthy URLs with unnecessary parameters and session IDs Avoid choosing generic page names like "page1.html" Keep the directory nesting as simple as possible Keep the directory names relevant to the content provided in the directory. Avoid using numbers for directory namesDo not mix up capital case in urls – like CreateOrder.html? – Users always prefer a single case (and lower case always)
  • 16. URL best practicesWeb sites should be as flat as possible, with content relating to highly competitive keywords implemented on pages high on the hierarchy.Rewrite URLs on the Server side to make them simpler and less nested.Note that Search engines always assign a lower relevance score to data which is found deep nested inside the Website. The Content on the top folders are considered much more relevant.
  • 17. Canonical URLMore than often, there are multiple ways to reach a same page on a Website.Canonicalization is the process of picking the best URL when there are several choices, usually referring to the homepage of a website.For e.g. consider http://www.google.com and http://google.com. Both URLs provide same content. Another example of this is “domain.com/aboutus.htm” and “blog.domain.com/aboutus.htm”More than often search engines are intelligent enough to recognize that the content on the pages is the same, and they would pick one of the URLs, which might not be out preferred one.
  • 18. Canonical URL – best practicesThere are a few ways to ensure that the proper URL is indexed:When linking to your homepage always point to the same URLWhen requesting links from other sites, always point to the same URLRedirect the non‐www homepage to the www version of the homepage, use 301 Permanent redirects. A 301 redirect example (JSP) is shown below.<%response.setStatus(301);response.setHeader( "Location", "http://www.new-url.com/" );response.setHeader( "Connection", "close" );%>
  • 19. HTTP 301 &HTTP 302302 is a temporary redirect 301 is the permanent redirectAs far as possible use only 301 for redirection. (Explained on previous slide)Always redirect from the server (Sample on previous slide)302 redirects indicate that the content is temporary and will be changed in the near future. Popularity attained by the previous site or page will not be passed on to the new site.301 Permanent Redirects should be used when the change is long‐term or permanent, which allows Page Rank and link popularity to transfer. This is taken care by the indexing engines of all major search engines.
  • 20. Name Value pairs in URLsName Value pairs are used on urls to provide information necessary to produce dynamic content. Urls tend to become lengthy with name value pairsThey contain numbers which are typically treated as junk by Search engines. Further “prod_code” does not make any sense to a common user. A Product name would have been betterUse valuable keywords in the name‐value pairs whenever possible and keep the quantity of pairs to no more than three.
  • 21. User Input Fronting ScreensMany sites have a front page where you need to enter your location or your details before it could give you information about products.Search engines cannot input information, or make selections from form drop downs. This means search engine spiders are effectively locked out of relevant content and cannot index or rank the content.Another problem is having a splash screen with a country chooser which does not allow people to go beyond that page without selecting the country to choose the locale. It is better to have a default locale and go inside and then give an option to change it. The Robot will be able to index your pages with such a design.
  • 22. Using mostly text for navigationLot of sites use flash or JavaScript to do navigations. Search engine spiders are unable to follow Java Script or Flash navigation and are therefore unable to find pages accessible only through Java Script or Flash navigation. Flash might not be supported on all browsers. User might not have installed the plug-in or could have disabled JavaScript.Only use HTML based navigationYou might have seen that most web 2.0 sites include a full sitemap on the footer. This is done to make sure that all the flash/script navigation links are replicated in HTML form for the spiders to make use of.
  • 23. The Web 2.0 footerPage copyright mint.com
  • 24. Provide alternative to flash contentSpiders cannot read flash content All links embedded in flash is never navigated or indexed If you cannot do away with flash due to usability reasons, implement a site with the same links in HTMLImplement user‐agent detection to deliver the HTML site to spiders and the Flash version to human visitors.
  • 25. Excessive In page ScriptingAll Web crawlers limit the amount of content they index from a pageTypically this is limited to 100 KB of data.If you have too much in-page scripting, the only thing the search engine might see is the script on your pageSome of the content on your page will be ignored if the limit is reached. Crawlers ignore the <script> tag, but the total content read (100KB) includes the scripts as well.It is always sensible to have your scripts on a different file and included on to your page. This way, you are not risking running out of the crawlers content limitations and still write a lot of code for dynamic behavior.
  • 26. Excessive In page ScriptingFollowing example shows the right way of doing this<link href="${ctx}/content/css/style.css" rel="stylesheet" type="text/css" /><script type="text/javascript" src="${ctx}/js/jquery-1.4.2.js"></script><script type="text/javascript" src="${ctx}/js/jquery.ui.core.js"></script><script type="text/javascript" src="${ctx}/js/jquery.dataTables.js"></script><script type="text/javascript" src="${ctx}/js/highcharts.js"></script><script type="text/javascript" src="${ctx}/content/js/page.mypage.js"></script>function setDefaults(){ $('#genericError').hide(); $("#catgErr").hide(); $("#allCatgs").attr('checked', false); $.ajax({url:"../callsomething", type : "POST",async:false,success:function(data){varlen = data.map.entry.length;for (i =0 ; i < len; i++) { //do something } }} ); }
  • 27. Session Ids on the URLA web server assigns a unique session ID variable within the URL for each visit for tracking purposes.Search engine spiders revisiting a URL will be assigned a different session ID each visit, which will result in each visit to a page appearing as a unique URL and causing indexing inconsistencies, and possibly duplicate content penalties.Should implement user‐agent detection to remove the session ID’s for search engine visits.
  • 28. “nofollow” settingsSetting the value of the "rel" attribute of a link to "nofollow" will tell search engine robots that certain links on your site shouldn't be followed or pass your page's reputation to the pages linked toVery true for all the pages which allow user comments. Say you a famous company and allow people to post feedback on your blog. Always set the “nofollow” to avoid the scenario like the following !Sample : <a href="http://www.cheapdrugs123.com" rel="nofollow">Comment by a spammer</a>
  • 29. 404 pagesPages or content that is moved, removed, or changed can result in errors, such as a 404 Page Not Found.Having a custom 404 page that kindly guides users back to a working page on your site can greatly improve a user's experience Your 404 page should probably have a link back to your root page and could also provide links to popular or related content on your site. NEVER EVER allow your 404 pages to be indexed in search engines Do not use a design for your 404 pages that isn't consistent with the rest of your site Repair all broken links as soon as possible
  • 31. The <title> tagMost Search Engines give a lot of weightage to what is the content in the <title> HTML tagA title tag tells both users and search engines what the topic of a particular page is.The <title> tag should be placed within the <head> tag of the HTML document Ideally, you should create a unique title for each page on your site.
  • 32. <title> tag tipsAlways put a sensible title for every page. Do not repeat the text in all the pages or a group of pages unless it makes sense .Make sure all your important business are reflected on the titleNever choose a title that has no relation to the content on the pageNever use default or vague titles like "Untitled" or "New Page 1“
  • 33. <title> tag tipsAlways put a sensible title for every page. Do not repeat the text in all the pages or a group of pages unless it makes sense .Make sure all your important business are reflected on the titleNever choose a title that has no relation to the content on the pageNever use default or vague titles like "Untitled" or "New Page 1“Google displays 63 characters from the page title on the search results, which means the first 63 characters should contain all relevant detail you needed.
  • 34. <meta> tagsA page's description meta tag gives search engines a summary of what the page is aboutLimit descriptions to 250 characters•Include all targeted key phrases•Copy should be written with users in mind (description copy appears in search results)•Create a unique meta description for every page
  • 35. <meta> keywords tagKeywords are mentioned in the head section of the html.Google gives very little importance to this Bing and Yahoo searches give some importance to this (Still makes sense to specify this).The search engine normally does not display these content in the search results.Use only relevant phrases on this tag. Use distinct phrases for the pages.
  • 36. Header tags <h1>, <h2>, <h3> A lot of importance is given by the Search engines to what content appears inside the header tags. Strictly one <h1> tag per page. This should be used for the most important heading on the page.<h2> and <h3> tags also should be used for the most relevant headings Always keep the natural hierarchy. First h1, second h2 and then h3.
  • 37. Importance of Anchor textAnchor text is the clickable text that users will see as a result of a link, and is placed within the anchor tag <a href="..."></a>. e.g. <a href="http://www.mydomain.com/articles/our-prices.htm">Lowest prices on earth for international calls</a> This text tells search engines something about the page you're linking to.Avoid writing generic anchor text like "page", "article", or "click here" Avoid using text that is off-topic or has no relation to the content of the page linked to Avoid using CSS or text styling that make links look just like regular text
  • 38. Duplication of ContentDuplicate content exists when two or more pages within a website, or on different domains, share identical content. Different domain names do not create distinct content. company.com/aboutus.html blog.company.com/aboutus.htmlMajor search engines consider duplicate content to be spam and are continually improving their spam filtering process to penalize and remove offenders.Avoid duplication of content as far as possibleUse 301 permanent redirects to inform search engines of the proper URL to utilize.
  • 39. Optimizing image contentImages form an integral part of any websiteThe "alt" attribute allows you to specify alternative text for the image if it cannot be displayed for some reasonThis is a very important usability aspect as the “screen reader” program used by blind people will identify and read out the alt text for them.Another reason is that if you're using an image as a link, the alt text for that image will be treated similarly to the anchor text of a text link.Optimizing your image filenames and alt text makes it easier for image search projects like Google Image Search to better understand and rank the images on your website.
  • 40. The robots.txt fileWeb site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.A sample can be seen here : http://www.robotstxt.org/robotstxt.htmlAll major search engine robots scan this file to see what pages are relevant to be crawled. The Disallow tags specify which pages should be ignored by the crawler.The robots.txt typically has the such informationDisallow: /residential/customerService/ Disallow: /residential/customerService/contacts.html Disallow:/residential/customerService/contactus/billing.html
  • 41. The robots.txt fileThere are some important considerations when using /robots.txt: Robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention. The /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use. You could put all the files you don't want robots to visit in a separate sub directory, make that directory un-list-able on the web (by configuring your server), then place your files in there, and list only the directory name in the /robots.txt. Now an ill-willed robot won't traverse that directory unless you put a direct link on the web to one of your files, and then it's not /robots.txt fault.
  • 43. Linking your websitesInternal linking between pages within a web site, such as navigational elements or a site map, plays an important role in how search engines perceive the relevancy and theme of both web pages. Proper intra‐site linking will help facilitate effective spidering, in addition to increasing relevancy of pagesMaintain a sitemap.Keep sitemap pages to less than 100 links per page Sitemaps should be linked directly from homepage and other major pages throughout the web site
  • 44. Promotion through external channelsEffectively promoting your new content will lead to faster discovery by those who are interested in the same subjectIncreasing back-linking to your site is one option, but it should be done properly. Social Media site (e.g. the facebook like) adds to your link count. Typically it is not advised to link every small update in this fashion, as search engines now-a-days even understand those patterns.You could include your updates to a RSS feed. You could link it from Blogs of people in the related community.Search engines of today, do not only go by page rank for determining the relevance. It also depends on traffic and content.
  • 45. SEO and AnalyticsTools for Web Masters
  • 46. Webmaster toolsEvery major search engine has launched their own set of Web master toolsGoogle: http://www.google.com/webmasters/Yahoo: http://siteexplorer.search.yahoo.com/Bing: http://www.bing.com/toolbox/webmasters/ We will examine some of the most important tools which Google provides.
  • 47. Webmaster toolsGoogle provides the following services:see which parts of a site Googlebot had problems crawling notify Google of an XML Sitemap file analyze and generate robots.txt files remove URLs already crawled by Googlebot specify your preferred domain identify issues with title and description meta tags understand the top searches used to reach a site get a glimpse at how Googlebot sees pages remove unwanted site links that Google may use in results receive notification of quality guideline violations and request a site reconsideration
  • 49. Web Analytics - IntroductionWeb analytics is the measurement, collection, analysis and reporting of internet data for purposes of understanding and optimizing web usage.It is a very important tool for Business and market researchWeb analytics provides data on the number of visitors, page views, etc. to gauge the traffic and popularity trends which helps doing the market research.Predominantly 2 TypesOff-siteOn-site
  • 50. Web Analytics - IntroductionOff-site web analytics refers to web measurement and analysis regardless of whether you own or maintain a website. It includes the measurement of a website's potential audience (opportunity), share of voice (visibility), and buzz (comments) that is happening on the Internet as a wholeOn-site web analytics measure a visitor's journey once on your website. This includes its drivers and conversions; for example, which pages encourage people to make a purchase. On-site web analytics measures the performance of your website in a commercial context.
  • 52. Methods for measuringLog file analysisAll Web servers record most of their transactions in a log file. (Access log for Apache)Was the most prominent method when the web evolved in late 90s.This involved running a tool to identify the hits to a page from the log file and determine statistics from the sameBecame very inaccurate in later times as there are a thousands of “non-human” actors on the web today. Googlebot is an exampleLog File analysis also failed when users enabled their browser caches. This resulted in pages being cached on the browser and when the user requested for the same pages, no hit was made on to the Web server.
  • 53. Methods for measuringLog file analysisAll Web servers record most of their transactions in a log file. (Access log for Apache)Was the most prominent method when the web evolved in late 90s.This involved running a tool to identify the hits to a page from the log file and determine statistics from the sameBecame very inaccurate in later times as there are a thousands of “non-human” actors on the web today. Googlebot is an example
  • 54. Methods for measuringLog file analysis – contd..The tools adapted to the robots by measuring the hits based on cookie tracking and ignoring the known robotsThis is not practical as robots are not only written by search engines, but also by spammers Log File analysis also failed when users enabled their browser caches. This resulted in pages being cached on the browser and when the user requested for the same pages again, no hit was made on to the Web server and content was delivered from the cache.
  • 55. Methods for measuringPage taggingDeveloped during later stages of the web Embeds a Java Script code segment on the pageWhen a tracking operation is triggered, data from the HTTP Request, browser/system info and cookies are collected by the ScriptThe Script submits the data as parameters attached to a image request sent to the analytics server. (Single pixel image)For e.g. take a look at the Google analytics data collection request which gets sent out.http://www.google-analytics.com/__utm.gif?utmwv=4&utmn=769876874&utmhn=example.com&utmcs=ISO-8859-1&utmsr=1280x1024&utmsc=32-bit&utmul=en-us&utmje=1&utmfl=9.0%20%20r115&utmcn=1&utmdt=GATC012%20setting%20variables&utmhid=2059107202&utmr=0&utmp=/auto/GATC012.html?utm_source=www.gatc012.org&utm_campaign=campaign+gatc012&utm_term=keywords+gatc012& …..etc…..
  • 56. Methods for measuringPage tagging contd..After the invent of the XHR (XmlHttpRequest) some of the page tagging scripts have used a AJAX submission of user data on to the collection server. This is often bound to fail due to restrictions on the XHR (Domain of Origin) on most of the modern browsers.As the page tagging approach Involves downloading a one pixel image from a domain (like Google) this adds an additional DNS (Domain Name System) lookup to your page which is sometimes looked upon as obstructive to page loading.
  • 57. Page tagging is the new AnalyticsPage tagging is the de-facto standard followed as of todayIt has a significant advantage that it works even for pages hosted on the cloud, meaning that you do not need to have dedicated web servers and monitor their logsAnalytics today is mostly an outsourced service. There are many specialist providers like Google and Adobe. And page tagging is the only method supported there.
  • 58. SEO and AnalyticsTools for Analytics
  • 59. Major tools – Web AnalyticsGoogle AnalyticsFree from Google (5M page view cap per month for non AdWords advertisers.)Uses Page Tagging as Analytics MethodUser embeds a Script in to the page The Script collects information on the page actions and submits the same to the Analytics Server by using the data as parameters on an image fetchDetailed reports are presented to the user by logging into your Google account
  • 61. Major tools – Web AnalyticsOmniture Fusion (Adobe)Uses page tagging for information collectionYou include a Script snippet on all the pages which are tracked. The information is submitted through Script call, almost same as what Google does, as parameters to “1px x 1px” transparent image request<body><script language="javascript" src="INSERT-DOMAIN-AND-PATH-TO-CODE/s_code.js" type="text/javascript"></script><script language="javascript" type="text/javascript"><!-- /* Copyright 1997-2004 Omniture, Inc. */s.pageName="“var s_code=s.t();if(s_code)document.write(s_code)//--></script></body></html>
  • 63. SEO and AnalyticsSome Key Terminologies
  • 64. Web Analytics KPIsKPIs are those metrics which give information on what changes could drive more effectiveness on your websiteAll KPIs are metrics, but not all metrics are KPIs.In Web Analytics it becomes very critical to measure the right things.
  • 65. First and Third Party CookiesFirst-party cookies are cookies that are associated with the host domain. Third-party cookies are cookies from any other domain. You go to the site http://yahoo.comThere is a banner ad on this site for http://youbuy.comBoth yahoo.com and youbuy.com place cookies on your browser So for you, the cookie from yahoo.com is a First Party cookie and the one from youbuy.com is a Third Party cookie.
  • 66. First and Third Party CookiesSo if I had placed the Google analytics Script on our page http://mozvo.com, and it had placed a cookie for the domain “google.com”, then that would have been a third party cookieThird party cookies are widely discouraged as there are quite a few sites which plant tracker cookies.A lot of users (about 40%) disable third party cookiesAll of the analytics providers have switched to using first party cookies to track information. Which means that the user will see only cookies from mozvo.com even though the Google analytics code is embedded on the page.
  • 67. Bounce Rate and Click through rateThe Bounce Rate : The bounce rate for the homepage, or any other page through which visitors enter your site, tells you how many people 'bounce' away (leave) from your site after viewing one page.Hence having a low bounce rate is preferred.Click Through Rate : Click-through rate (or click-thru rate) tells you how many people are clicking through to your site from a third-party. For example from a link, search engine, banner, advertising or email campaign.A Higher Click Through rate is preferred.
  • 68. Click Stream AnalysisClickstreams, also known as clickpaths, are the route that visitors choose when clicking or navigating through a site.A clickstream is a list of all the pages viewed by a visitor, presented in the order the pages were viewed, also defined as the ‘succession of mouse clicks’ that each visitor makes.A clickstream will show you when and where a person came in to a site, all the pages viewed, the time spent on each page, and when and where they left.The most obvious reason for examining clickstreams is to extract specific information about what people are doing on your site..
  • 70. Thank You !Thank you !http://nsreekanth.blogspot.com/