SEO Series 5. How Search Engines Work: Crawling, Indexing & Ranking (2025 Edition)

  • Published
  • Updated
  • Posted in Blog
  • 17 mins read
  • By

What are search engines?

At heart, a search engine is

(1) the search index – a huge digital library of information about web pages and

(2) the algorithms the librarian that match your query to the most relevant items in that library.

The goal is simple: deliver the best, most relevant results and most useful answer —fast enough to feel instant (Google for Developers, 2025). Because users return to the platforms that give them the most relevant results, relevance is money.

How do search engines make money?
They show two result types:

  • Organic results (from the index) – You can’t pay to appear here.
  • Paid results (ads) or pay per click (PPC) – you can pay to appear here. Each time someone clicks a paid search result, the advertiser pays the search engine. And that’s why market share matters because more users à more searches à more ad clicks = more revenue.

Case study from our client (OTINGA): Why “being seen” starts with crawlability

“I’ve got 500 products. Why can’t anyone find me on Google?”

Our e-commerce client had 500 products, but it seems people can’t find them on Google?” aka ‘no Google traffic’. The issue? Google couldn’t see most of their site. Their product pages weren’t crawled, so they weren’t indexed means ranking was impossible. Once we fixed crawl paths, sitemaps, internal links, and rendering issues, the change was almost instant. Indexation rose → impressions rose → sales followed.

Another client we have is struggled to rank until we advise to add an XML sitemap and fixed canonical tags. Within 6 months, her pages went from “Discovered – not indexed” to driving 15,000 monthly organic visits.

If Google can’t see (crawl), your pages, it can’t index them. If it can’t index them, it can’t rank them. You don’t exist.

How Search Works – The 3 core stages

Google, founded by Larry Page and Sergey Brin back in ’98, still does one thing better than anyone else: help people find what they’re looking for, fast. Every single day, billions of searches roll in — from “best fish and chips near me” to “how to fix GA4 spam.” If you want to be found, your site needs to be discoverable, understandable, and genuinely valuable.

As Google notes, ‘users often treat Search as the front door to the internet’ and some of the top queries are literally “YouTube” and “Google” themselves (Google Search Console, 2024). That’s why getting your website indexed and ranked properly isn’t just an SEO win, it’s how you show up where your audience already is. The real question is: Do you know how a Google Search result actually comes together and how to make sure your website is fully Googleable? Each search engine has its own process on how it works and builds their search index. But they are three main steps process use by Search engines like Google, Bing, and even TikTok search: crawling à indexing à ranking. But here we make it a bit deeper and show how it actually works.

Step (analogy)

What Happens

What Google Does

Your Job (as SEO)

Key SEO Factor

Crawl

Analogy: librarian walks the aisles to discover books (links & sitemaps).

Googlebot (the crawler) scans links & sitemaps and continuously discovers URLs from:

  • Links/backlinks on already-known pages
  • XML sitemaps you publish
  • Manual URL submission via Search Console (useful, but not required) (Google for Developers)

Finds URLs via links, sitemaps, hints

  • Make paths obvious: keep links as native HTML <a href>, build solid internal linking, ship XML sitemaps, fix broken paths, avoid accidental blocking in robots.txt. Keep navigation clean and don’t hide essential assets.
  • Making certain changes to your website design (how easy it is to navigate and read and user-friendly) and content (keywords, words or phrases, promote it) in the web that will be possible for people to find = attractive to a search engine, so it will rank your page to the user/query.
  • Use sensible folders and internal links so bots can hop from key pages to deeper content.

Internal linking, Robots.txt, Crawl budget, XML sitemaps, unblocked assets, sensible structure

Render & Process

Analogy: open the book to see what’s inside

Fetched pages are rendered (HTML/CSS/JS executed) so content and links become visible.

Uses headless Chrome to “see” what users see; extracts content and links.

Don’t hide key content behind blocked JS/CSS; expose primary content in HTML where possible; If your stack is JS-heavy, consider SSR/SSG or hydration that exposes primary content in HTML.

Clear primary content; accessible assets; renderability.

Index

 

Analogy: catalogue the book).

Pages stored in Google’s database processed content (text, images, video, structured data) is stored in the Google index. This is the library searchers query. Indexing is not guaranteed Google selects what it believes is worthwhile. Quality, uniqueness, clarity, and accessibility matter. (Google for Developers)

Understands/stores page content ; de-duplicates similar pages and chooses a canonical.

  • Give every indexable page a clear purpose, Clarify topic; unique value, clean signals (titles, headings, canonicals tags), and—where relevant—structured data (e.g., Article, Product, FAQ) that truthfully describes what’s on the page.
  • Focus on one clear topic per page. Provide unique value (not just rewritten text).
  • Use clean, descriptive titles and headings.
  • Set correct canonicals (so Google knows which version to prioritise).
  • Add structured data (like Article, Product, or FAQ schema) to help Google understand your content type.

Topic focus; uniqueness; schema; canonicals; overall quality.

Rank and serve

Analogy: deciding which books go on the front display for this question.

Algorithm orders results. Ranking (ordering results) – decide which pages deserve the click and in what order.

When someone searches, Google evaluates many signals to pick and order results. Google hasn’t published the full list (and changes systems regularly), but it does document fundamentals you can rely on:

  • Relevance & intent match (does the page satisfy the task behind the query?)
  • Quality & usefulness (people-first content; evidence of expertise)
  • Experience (fast, stable, mobile-friendly pages; safe/HTTPS)
  • Authority (earned links/mentions—no schemes) (Google for Developers)

Think of links as votes—but not all votes weigh the same. Authoritative sites carry more weight. (Correlational studies continue to show strong relationships between unique linking domains and organic traffic.)

Chooses best results for the query

Match intent; demonstrate E-E-A-T; be fast, stable and mobile-friendly, earn relevant mentions/links; craft compelling titles/snippets.

Intent match Backlinks/mentions, E-E-A-T, Relevance, UX signals/ Core Web Vitals (CWV)

Feedback (reader reviews & circulation stats)

Users click, read, return, convert; content changes over time.

Learns from interactions and freshness signals.

Refresh, improve, interlink; tune snippets for CTR; monitor Search Console and iterate.

Freshness, engagement, snippet quality, continuous improvement

Table 1: How Search Works (and what you can do)
This table summarises how each search step works, what Google does behind the scenes, and what you can do to optimise for every stage.

A closer look at the stages

Every time you search something lets say “best sushi near me”, Google checks billions of web pages in less than a second before showing you an answer. But how does it decide what to show? It follows a pipeline. Think library → catalogue → recommendations.

Here we explain how search engine work stage in detail:

1. Queries (what the user asks)   

When someone searches, the engine scans its index (its catalogue of known URLs) and returns the best match. If search engines are libraries, SEO is your job to prove your page is the right book (the best answer for that query).

2. Crawling (how pages are found)

is when a computer bot called a spider or crawler (or Googlebot for Google) discover and fetches web pages – It visits known URLs, fdownloads them, and follows links to uncover new ones. This chain reaction builds and refreshes the index. Think of it as a robot librarian roaming the internet, scanning every “book” (web page) it can find, taking notes, and adding the good ones to its digital catalogue.

How Googlebot decides what to fetch: Googlebot uses smart algorithm to decide:

  • which websites to crawl,
  • how often to visit them, and
  • how many pages to fetch.

It also adjusts its crawling speed automatically to avoid overwhelming your server. The faster and healthier your site responds, the more efficiently it can be crawled.

Not every page discovered gets crawled or indexed. Some might be blocked by a robots.txt file, require a login, or simply not meet Google’s quality threshold. Googlebot only crawls publicly accessible URLs. The process usually starts from a list of called seeds URLs (a known web addresses) and expands by following hyperlinks from one page to another, discovering new content continuously.

How Googlebot discovers URLs (your entry points)

Google builds and refreshes its index by finding URLs from:

  • From backlinks (links from pages it already knows)Google has an index of hundreds of billions of webpages. So, if someone links to a new page from a known page, Google can find it from there.
  • From sitemaps (your XML “table of contents” of important URLs)Sitemaps tell Google which pages and files you think are important on your site.
  • From manual URL submissions. Google lets site owners request crawling of individual URLs in Google Search Console. The first step in the crawling process is called URL discovery. Before Google can surface a web page in its Search results, it has to know the page actually exists. Most new URLs and  pages are usually discovered when Google follows links from known hubs (e.g., a category page to extracting the URLs that lead to new articles) or previously crawled and keeps revisiting to find fresh content. However, with trillions of URLs out there, some will never be discovered thus structure and internal linking matter. Google discovers are from other known pages that Google previously crawled.

What to know:

  • Google is fully automated and crawls the web constantly. You usually just need to publish.
  • Crawlers should see pages the same way users do. If you block key files (CSS/JS) or hide important content behind scripts, Google may not understand your page → weaker visibility.
  • Not every discovered URL gets crawled. Crawl rate depends on server responsiveness, content quality, and other signals.
  • Private or login-only content isn’t crawled; crawlers only fetch publicly accessible

2a. Rendering & Processing (understanding the page)

After fetching process (downloading the data that’s served from a certain URLs), Google renders the page to “see” content and links as users do – a process of downloaded pages from a URL, which is usually a mix of files containing HTML, CSS and JavaScript, and turns it into a visual representation of that page. In doing so, it will run any JavaScript it finds using a recent version of Chrome. If important content appears only after JS executes, ensure Google can fetch JS/CSS and render it successfully. For heavy JS stacks, consider SSR/SSG or hydration that exposes primary content in HTML. The rendering service is important because websites often rely on JavaScript to bring content into the page and to make it livelier. Without rendering, Google wouldn’t see that content and, of course, would miss out on all the liveliness, blinking elements and scrolling text you can possibly read.

3. Indexing (Adding Pages to Google’s Library Catalogue)

Once Google finishes crawling and rendering your page, it needs to understand and store that information so it can show it later when someone searches for it.
That’s where indexing comes in. Think of indexing like adding books to a library catalogue. Each crawled web page is processed, analysed, and then stored in Google’s massive search index — a huge database that holds billions of pages.
When you search on Google, you’re not searching the live web — you’re searching Google’s index. If your page isn’t indexed, it’s invisible.
No matter how great your content is, users can’t find it unless Google has filed it in the right “shelf” of its catalogue.

How Google Decides What to Store

After a page is fetched and rendered, Google:

  1. Parses the HTML – cleans up broken or messy code, reads your text, images, videos, and structured data.
  2. Understands context – figures out what your page is about, how it relates to others, and what kind of content it contains (e.g. an article, product, recipe, or video).
  3. Handles duplicates – groups similar or identical pages and picks one “canonical version” that best represents the group.
  4. Calculates signals – measures things like quality, clarity, uniqueness, intent match, and accessibility.
  5. Stores searchable data – if your page passes the quality threshold, it’s added to Google’s searchable index.

Not every page gets in. Google only indexes content that it believes is useful, unique, and trustworthy.

The Canonical Page (Avoiding Duplicates)

When Google finds multiple versions of the same or similar content, it selects a canonical page — the version that best represents that group.

It uses signals like:

  • rel=”canonical” tags you define
  • internal and external linking
  • page authority and content uniqueness

Only this canonical version appears in search results, while other duplicates are kept as alternates for specific contexts.

Index Selection

Once duplicates are filtered and quality signals analysed, Google decides whether a page deserves a place in its index – a process called index selection.
This decision depends on:

  • Page quality and originality
  • How well it matches user intent
  • Technical accessibility (mobile-friendly, HTTPS, etc.)

If a page makes the cut, its content is stored across thousands of Google’s servers – ready to be fetched when a matching query appears.

4. Ranking (Who Gets Shown and where)

So, your page has been crawled, rendered, and indexed – great.
But now comes the part every marketer cares about: ranking.
This is where Google decides who gets shown where on the results page. Ranking is Google’s way of sorting all the pages in its index and showing the best ones first – the ones it believes are the most useful, trustworthy, and relevant for that specific query.

How Google Decides Rankings

Google uses hundreds of signals (known as ranking factors) to decide which results appear at the top. Some of the most important ones include:

  • Relevance – How well your content matches the search intent.
  • Content quality – Depth, originality, and clarity.
  • User context – A user’s location, language, and device type.
  • Authority & trustworthiness – Signals like backlinks, mentions, and brand reputation.
  • User experience – Page speed, mobile friendliness, security (HTTPS), and ease of navigation.

For example:

  • A search for “bicycle repair shops” in Paris will show local businesses nearby.
  • The same search in Hong Kong returns completely different results.
  • Search for “modern bicycle” instead, and you’ll likely see image results — not local shops.

In short, ranking is contextual. What you see depends on who you are, where you are, and what you mean.

What Determines Quality

Quality isn’t just about grammar or visuals. Google looks at:

  • The uniqueness of your content (not copied or spun).
  • How well your page satisfies the user’s intent.
  • The reputation of your website across the web.
  • Whether your content demonstrates E-E-A-T:
    • Experience (first-hand insight)
    • Expertise (accurate information)
    • Authoritativeness (trusted by others)
    • Trustworthiness (secure, transparent site)

As Google explains in their documentation, “Ranking largely depends on how well your content helps users achieve their goal.” In other words, the more helpful you are, the higher you go.

OTINGA Tip

To help Google index and rank your pages properly:

  • Give each page one clear purpose. When Google understands your content, it can show it to the right people — at the right time.
  • Offer High-quality, unique, human-first valuable content and avoid duplicates.
  • Keep titles, headings, and canonicals clean.
  • Add structured data (schema) for clarity (e.g. Article, Product, FAQ) data that helps Google “understand” your content.
  • Relevant, descriptive meta titles and headings
  • Building trust — backlinks, reviews, and expertise Make it easy for the “catalogue” to file you correctly.

SEO isn’t about gaming the algorithm; it’s about being the best answer to your user’s question.

5. Serving result (SERP): Bringing Indexed Pages to Life

When a user types a query, Google searches its index – not the whole web – to find the most relevant and reliable pages.

The process looks like this:

  • Google interprets the query (removes unnecessary words, recognises entities like “Statue of Liberty”, expands synonyms like car → automobile).
  • The refined query is matched against indexed pages.
  • The system ranks the best matches based on hundreds of signals (relevance, quality, freshness, user experience).
  • Finally, results are displayed on the Search Engine Results Page (SERP) — what you see as Google Search results.

Now that Google knows which pages deserve to rank, it’s time to serve them to users. This is what happens on the Search Engine Results Page (SERP) – the screen you see after hitting “Search”.

The Anatomy of a Search Result

Every result on Google typically includes:

  • Title Link (blue link) – The main clickable title that takes users to your page.
  • Snippet (description) – A short preview of your content, often pulled from your meta description or relevant on-page text.
  • URL / Breadcrumbs – Show where the page lives on your site (e.g., www.otingamarketing.com/blog/seo-basics).
  • Attribution Elements – Your site name, favicon (logo), and sometimes a publication date or author name.

These results were once known as “10 blue links”, but today’s SERP is much more dynamic — with featured snippets, FAQs, video carousels, maps, shopping results, and more.

To stand out on Google’s results page:

  • Craft clear, click-worthy titles (include your main keyword naturally).
  • Write compelling meta descriptions — short, benefit-driven, and user-focused.
  • Use schema markup for rich results (FAQ, Product, Review, etc.).
  • Keep your branding consistent — favicon, site name, and tone.
  • Check how your pages look on both mobile and desktop SERPs.

Checklist – How to Make Sure Your Site is Crawlable, Indexable & Rankable:

✅ Submit XML sitemap to Google Search Console.
✅ Check robots.txt — don’t accidentally block important pages.
✅ Use internal linking to help bots navigate.
✅ Optimise site speed (Core Web Vitals).
✅ Add structured data (schema).
✅ Ensure mobile-first design.
✅ Publish original, helpful content.
✅ Monitor “Coverage” reports in Google Search Console.

SEO is like running a library:

  • Crawling is the librarian walking through shelves to see which books exist.
  • Indexing is writing down details about each book in the catalogue.
  • Ranking is deciding which books go on the front display when visitors ask for “Best cookbooks.”

Fact to know:

  • Google is fully automated, doesn’t accept payment to rank you higher, and never guarantees indexing — even if you follow the rules. Your job is to make value and eligibility obvious. Google for Developers
  • Most visibility issues trace back to stage one (bots can’t reach/see) or stage two (thin/duplicate/ambiguous pages aren’t worthy of index slots).
  • having a Googleable website, especially one that shows up near the top of results, is highly dependent on the quality of your pages’ contents. For example, large amounts of gibberish text will do poorly in Search results. That will make more sense once we go over how Google sees the information on your website’s pages.
  • Google owns 91.43% of the search engine market. It can send you more traffic than other search engines, as it’s the one most people use.
  • Google processes 8.5 billion searches per day (Statista, 2025).
  • 91% of web pages get zero organic traffic because they aren’t indexed properly (Ahrefs, 2024).
  • Mobile-first indexing is now default — meaning if your site isn’t mobile-friendly, it might not rank at all.

Conclusion

Search visibility starts with the basics. If Google can’t crawl you, it can’t index you. If it can’t index you, it can’t rank you.

Read Previous: 

SEO Series 1: History of SEO – how its starts

SEO Series 2: Best practices guide to appear and perform well on Google Search

SEO Series 3: SEO Glossary 2025

SEO Series 4: What is SEO? A Complete Beginner to Advanced Guide (2025 Edition)

Next in the OTINGA SEO Series: Keyword Research — Finding the Terms Your Audience Uses.

Subscribe to OTINGA Marketing for the full 70-part SEO series.

References

Leave a Reply