Skip to content
RTILA Web Business Automation
  • Home
  • Features
  • Pricing
  • Marketplace
  • Support
    • Documentation

Cart

RTILA Web Business Automation
  • Home
  • Features
  • Pricing
  • Marketplace
  • Support
    • Documentation
Free Download
Free Download

Billing & Licensing

10
  • Change License’s registered email address
  • Upgrade a License
  • Manage License Activation Count
  • RTILA Studio local database
  • Standalone Exe Antivirus False Positive Alert
  • Team Member Activation URL & License
  • AppSumo Codes Redemption
  • Download & Activate RTILA Studio
  • AppSumo Code Stacking & Upgrade
  • Mac OS Installation Warning

Custom Commands

36
  • WordPress Posts via API (beta)
  • Webhook Send Request
  • Target Text Before After
  • Download File to Folder
  • Clipboard Copy & Paste
  • ChatGPT API Full Control
  • Generate Random Numbers and Text
  • Regex & JS Filters
  • API Bridge Get Post Requests
  • WhatsApp API send MSG
  • GET HTML
  • Slack Notification Command
  • Target Elements With Text Value
  • Directory Get Files Path
  • Folder and File Monitoring
  • Get iFrame URL
  • Get File Content
  • Verify License Easy Digital Downloads
  • Save Variable to File
  • Rename File
  • Move File to New Directory
  • Delete File
  • Mouse Events
  • Get System Info
  • Airtable Get & Update Records
  • Email Send Via SMTP
  • Wait For Element to disappear
  • OCR Passport Reader
  • Target Element in Shadow Dom
  • Airtable Get Records
  • Airtable Update Records
  • Sanitize URL
  • Email Verification
  • Get Hardware ID HWID
  • Timestamp Unix and UTC
  • Switch Tab Focus Command

How-To & Tutorials

42
  • How to target a CSS element
  • Change default Browser
  • Export Results to a CSV file
  • Profile Session Feature
  • OCR Feature: read text from images
  • Auto Download Pinterest Images
  • Save current URL using JavaScript
  • Search & Filter Projects
  • Error handling Strategy
  • Working with Arrays and Objects in RTILA
  • Auto Comment On WordPress Posts
  • Run Automations in Silent Mode
  • License Check for Standalone Executables
  • Trigger Standalone Bots via Command lines
  • How To Find Custom Collections For List type Datasets
  • Correcting & Completing Auto-Recorder Commands
  • RTILA WordPress Plugin installation & configuration
  • Using Developer Tools In RTILA Studio
  • Create A Project From Scratch
  • Bring back disappearing commands & properties
  • Export & share an RTILA project file
  • Install Browser Extensions using Profile Session
  • How To Fill a Form Using Generic Form Filler Child-Project
  • Project Settings: Import URLs manual entry, From File, From Project, and Read XML Sitemap
  • Project Settings: Import URLs manual entry
  • Use Local Storage Variable to scrape Do-follow Links
  • Schedule Launch of Automations
  • Email Results File via Gmail
  • Read from Google Sheets & Post on WordPress
  • Website Load Testing Automation
  • Read data from a txt or csv file
  • Downloading files
  • Open in a New tab
  • Using filters to complete a URL
  • Choosing the right collection
  • Set a Counter with JavaScript
  • Setup reCAPTCHA Resolution
  • Woo Categories & ChatGPT API
  • Login to Google Account & share profile session
  • Google Search Baby Steps
  • Auto-Recorder as a 1st step
  • Standalone Executable Bots

Official Commands

58
  • List Command
  • Incogniton Anti-detect browser
  • Save results to file command
  • RTILA Cloud API Documentation
  • FTP / SFTP Command
  • Custom Commands
  • Integrations
  • RPA & Desktop OS Commands
  • Add And Configure Dataset Properties
  • Inspection Panel Interface & Elements
  • Config & binaries files for Standalone
  • Focus On Element Command
  • Go To Url Command
  • Scroll Element Command
  • Execute JavaScript Code command
  • Reload Page Command
  • Compare Variables Condition
  • Take Screenshot Command
  • Smart Variable (ChatGPT API)
  • Child Projects
  • Confirm (Dialog Box) Command
  • Populate Text Field Command
  • Hover Mouse On Element Command
  • Download Page Command
  • Stop Automation Command
  • Log Message Command
  • Input (prompt box) Variable
  • Extract Results Command
  • Wait for Element to Appear Command
  • Selector (DOM element)
  • Check Radio Input Command
  • Dynamic Variable (JavaScript Code)
  • Static Variables
  • Set Checkbox State Command
  • Set Dropdown Value Command
  • Press a Keyboard Key Command
  • Upload File Command
  • Double Click On An Element Event
  • Click On An Element Event
  • Switch Browser Identity Command
  • Slack Notification Command
  • Save as Pdf Command
  • Go Back To Previous Page Command
  • Go Forward To Next Page Command
  • Proxies Built-In Rotation
  • External Proxy Rotation API
  • Regular Expressions
  • Mock Location Command
  • Close Page Command
  • Desktop Notification Command
  • Command Folder
  • Clear Cookie Command
  • Change Page Size Command
  • Break Loop Command
  • DataSet Types
  • Link Crawler Command
  • Alert Message Command
  • Wait Commands
  • Home
  • Docs
  • Official Commands
  • Link Crawler Command
View Categories

Link Crawler Command

3 min read

Types of Crawlers & Use Cases #

RTILA Studio has 3 types of Crawler commands with slight differences but they overall work the same way. The differentiation is about the location of the links/pages that are to be crawled, whether they are “Internal”, “External” pages to the website we are on, or a mix of both.

The Crawler command is a powerful scrapping enabler that automatically recognizes and crawls web links of a given page, in a complete (all links) or selective manner (conditional logic).

In addition the Crawler is equipped with a Multi-threading capacity that allows you to crawl and open multiple tabs at the same time and significantly increase the speed of your automation. Assuming no firewall limits exist, the Crawler could crawl 10 pages per second or even more.

Crawler Configuration #

A great number of configurations are available for you to define and fine tune your crawler automation flow.

  1. To rename your crawler block
  2. Type of the crawler
  3. Depth of crawling. If 1 it will only crawl the links available on that page. If set to 2 it will crawl all the links of that page and also crawl the links inside the secondary pages.
  4. Number of tabs that are opened at the same time. Up to 10 if your internet connection and the website are fast. Otherwise a safer cruise speed is 3 to 5.
  5. If you want to limit the number of pages crawled, otherwise leave zero to crawl everything.
  6. Types of file extensions you want to include
  7. Check if you want to create a “human like” random crawling instead of sequential top to bottom order.
  8. You can add conditions for the Crawler to ignore or exclusively focus on URLs with specific keyword appearing (or not) in the URLs.
  9. Here it will only crawl links that contain “/product/”
  10. If you want to add a delay before each link opening
  11. Timeout for when one of the tabs/links is not loading properly
  12. Specify what you want to wait for in terms of page loading status.

Once you have setup your Crawler for mass data acquisition, you can just add an “Extract” command inside this block to ensure that the data properties you have set in the inspection panel are captured for each crawled page.

Crawler in action #

Below is a screenshot of our crawler going through the topics pages of GitHub to open all the links that contain “/topics/” in the URL. The multithreading is set to 7 tabs (includes the starting page) as this was the most reliable speed for our internet connection. The same can be achieved for any type of listing or directory website that contains structured internal or external links. Our crawler is able to crawl over 1 Million internal links of a specific directory, depending on the security threshold and load capacity of the target website and we advise to span the crawler over a longer period of time and a lower number of concurrent tab for better and more ethical results. RTILA is very stable so take it as a Marathon not a Sprint and pace your automation.

    Still stuck? How can we help?

    How can we help?

    Updated on 01/11/2024
    Alert Message Command

    Powered by BetterDocs

    Table of Contents
    • Types of Crawlers & Use Cases
    • Crawler Configuration
    • Crawler in action

    INFO & LEGALS

    PRICING
    PAYMENTS & REFUND
    COOKIES - PRIVACY
    LICENSE AGREEMENT

    DOWNLOADS

    BOT LAUNCHER
    RTILA STUDIO ON GITHUB
    BOT & TEMPLATES
    PARTNERSHIPS

    RESOURCES

    VIDEO TUTORIALS
    DOCUMENTATION
    SUPPORT PORTAL
    FB COMMUNITY SUPPORT

    stay in touch

    Subscription Form

    follow us on

    • Facebook
    • YouTube
    • RTILA LinkedIn Page
    Copyright © RTILA CORPORATION