What are custom collections In RTILA? #
Custom collections in RTILA refer to a feature that allows users to organize and manage the data they extract from websites in a structured and personalized manner. RTILA is used to automate the process of gathering information from websites, and custom collections enhance the efficiency and usability of this data extraction by providing a way to categorize, filter, and store the scraped data according to the user’s preferences.
Overall, custom collections in RTILA provide a powerful way to efficiently manage and make sense of the vast amounts of data collected from the web. They contribute to better organization, personalized workflows, and enhanced collaboration, making the process of web scraping more productive and user-friendly.
Why do we need to find a custom collection? #
In RTILA a custom collection refers to a specific set of data or information that you want to extract from a website. Custom collections are essential to focus on a specific element in a web page. They help you obtain the specific information you need while avoiding unnecessary complications and ethical concerns associated with scraping irrelevant data.
Let’s say that a website has a list of products for example the App Sumo all products page, and we have already chosen the right CSS selector for each page just like the shown example in this picture :
you can see that we are missing some data in the red shapes, in this case, we are collecting each product data from a list of products and these products are displayed in div elements as you can see the green rectangular in the middle of the page. So what do we need to do in this case?
Yes, you probably have thought about it. We need to tell RTILA to focus only on elements in these green rectangular and avoid anything else on the page since all of our needed data is displayed there. How do we do that?!
Very simple! Just follow the upcoming steps on How to find & set the right custom collection.
How To Find & Set the Right Custom Collection? #
As we have already discussed, we now need to find the custom collection of the wanted element ( in this case the product Div element).
Here are the two steps we need to solve this issue and get all the data to show up in our dataset properties:
Find the right custom collection #
In order to find our custom collection we might need to use one of the internet browsers such as Chrome, Firefox, and Edge in case we were not able to select the div elements of each product using the CSS inspector in RTILA. Open the page in your browser and click on the right mouse click on the wanted element, then click on inspect. you can now see the page source code.
As you can see in the right section there are multiple div elements with the same class! what do you think these divs represent?
Yes each div represents one product and you can find each product’s details within its div elements, but in this case, we only need to find the custom collection which is applied and used in each product element. In this case, you can see that our wanted custom collection is the ( class=”relative h-full” ) or similarly as shown in the left side ( div.relative.h-full )
Now that we have found our custom collection we can move to the next step.
Set the Right Custom Collection #
Since we have already found our custom collection, all we need to do now is to convert this custom collection from the right section into an acceptable format in this case ( div[class*=”relative h-full”] ), you can find all about CSS selector format from here: https://www.w3schools.com/cssref/css_selectors.php
Alternatively, we can use the shown CSS selector on the left side which is ( div.relative.h-full ) in some cases it may be better to use the first method.
Now we are ready to set the custom collection by clicking on the configure dataset button and then past that custom collection on its field.
As you can see once we added the custom collection all the data showed up and there was no loss of any property.
Training Demo On How to Find Custom Collection #
Here is another training demo of the website : https://park.io/domains
that shows how to find and set the right custom collection for your project.
You can look at the first picture and see how did we get the Domain name property by finding the CCS Selector manually using the developer tools inside the RTILA browser panel
In the second picture, you can see how we find the custom collection by looking at the source code of the table element, in this case ( tbody > tr ) element that represents each row in the table, we could command the dataset property inspection to ignore anything else and only focus on each row of the table since all the data we need is in that element.
Note: Finding a custom collection for some websites could be challenging for users who have no basic knowledge of web development or CSS, but don’t worry. If you have purchased RTILA and you are not using a free plan, you can always submit a support ticket at https://rtila.com/support/
RTILA’s support team is always happy to help you out.