The basic answer is: There’s nothing extra you need to be doing except for creating useful content and building a clear website architecture which would be easy to understand to both search engines and users.
How does Google discover websites and URLs?
Google crawls the web by following the links. The crawl from page to page taking snapshots of each page and saving them in their Cache.
Then they use the variety of signals to understand how important, useful and trusted each page is:
- Is the content unique?
- Is the content indepth?
- Are there external links to that page?
- How are users interacting with that page?
Which tools can I use to find out if my page has been crawled by Google?
Use Google’s operators:
Google Webmaster Tools offer crawl statistics too. It also signals of any errors if there were not able to crawl a particular page.
Where do I submit my site to get it ranked?
Don’t take the trouble… Google is very good at finding new sites and URLs very fast provided they have backlinks pointing to them from elsewhere for a search bot to crawl.
If your page has no backlinks, getting it crawled won’t help you. It won’t show up in Google search unless it’s linked.
If you have a page that Google hasn’t found, there’s a problem with that page: It must be hidden somewhere deep in the site with few or 0 backlinks. Submitting that URL to Google won’t help: Diagnose and solve the underlying problem.
Generally, you want to make your site smaller and flatter to get everything crawled well.
If I did find the page or directory which is not cached by Google, which tools should I be using to diagnose the problem with it?
Here are the tools that will help:
- Look at yoursite.com/robots.txt and see if you are blocking any pages
- Look at your site code (CTRL/i in Firefox on Windows and CMD/i in Firefox on Mac)
- Website Crawler and XML Sitemap Generator: Identify which URLs cannot be accessed by Google and see the error
- Web Page SEO Analysis Tool: Get a quick snapshot of your page and see the basic issues it may have
- Use Google’s cache-only version: Using these Greasemonkey scripts (called Google Cache Continue Redux) you can also browse your site using Google cache
Bottom line: It’s not about how many pages of your site get crawled, it’s about how many of your site pages rank in Google!