Salesforce

Scanning a Website

« Go Back
Information
Scanning a Website
UUID-49bd8301-a150-6107-7409-de3297816efa
Article Content

The first step of your Cookie Consent implementation is scanning your domain. The main aim of scanning a domain is to identify first- and third-party cookies, tags, trackers, pixels, beacons, forms and storage.

When a new website is added to the OneTrust application to be scanned, a record is created for the domain and the scan. The scan will remain in the Pending status while records are being created and until the scanner starts.

The scanner is a virtual machine that runs on Mozilla Firefox. It will only scan pages that are within the domain that you have entered. For example, if you scanned onetrust.com and there is link within onetrust.com to zentoso.com, zentoso.com pages would not be scanned. Subdomains of the domain scanned will be included in the scan if scanned from the root domain. For example, if you scan onetrust.com, all subdomains of onetrust.com will be included in the scan, but if you scan www.onetrust.com only, this subdomain and pages/paths of this subdomain will be scanned.

When a domain scan is added in the OneTrust tenant, the OneTrust cookies OptanonConsent and OptanonAlertBoxClosed will be added in the Categorizations tab. For more information on these cookies, see OneTrust Cookies.

Caution

If your scanned domain does not align with the domain where the script is integrated, consent will not be correctly captured and the banner will reappear every time you load the page.

The scanner treats any URL starting with www. as a subdomain.

If you scan www.onetrust.com, the OneTrust cookies will only write to this subdomain with base script functionality. 

If you scan onetrust.com, the OneTrust cookies will write to www.onetrust.com, cookies.onetrust.com, consent.onetrust.com, and so on. Similarly, if you scan a path like onetrust.com/company with the Limit scan to this path within site setting enabled, the OneTrust cookies would be saved to that path.

For more information, see OneTrust Cookies.

Any cookies that are dropped on a user action and not page load (for example, a form submission other than login or add-to-cart action) will not be recorded in the scan results. If you have a login page behind which there are additional cookies and would like OneTrust to identify those cookies, see Scanning Behind Authentication. If the login page is redirecting to a different domain, then the login URL as well as redirect domain both will be required to be entered in the login configuration.

While the scanner will attempt to reach all pages it can, scanning every page on your site is not necessary. Scanning most of the pages will record the cookies and allow you to perform the necessary blocking activities efficiently.

If there is a particular page that you would like to include/exclude in the scan, or scan with priority or add it to the target pages in the scan configuration. You can also enter your sitemap directly in the scanner configuration. If you put a page limit in the scan configuration, the scanner will only scan up to the limit of pages, but is not guaranteed to scan that exact number of pages.

When the scanning activity has completed, the status changes to Processing Data.

During the data processing activity, the scan data will be migrated to your tenant and categorized based on the OneTrust Cookie database Cookiepedia.

Once the data processing activity is complete, the scan will move into a Completed status.

OneTrust typically scans a website within 6 to 24 hours. However, the scan time is dependent on the number of scans added to your OneTrust environment at one time and the number of pages a website has. For example, if your site has 15,000 pages, it might take longer than the suggested time for the scan to complete. To help improve scanning time, you can limit the number of pages scanned for a domain.

Scanner Statuses

  1. Pending

    Your scan has been submitted and the proper tasks are being created for the scan to begin.

  2. Scanning

    The scanner is now scanning the domain that you have entered to be scanned. The scanner acts like a user clicking from page to page. During this process the cookies dropped on each page load will be recorded.

    Note

    While it is normal for scans to take multiple days, if your scan is taking too long to complete you can click Get Help on the Context menu to retrieve additional scan metadata. This will help support resolve the issue.

  3. Processing Data

    The data collected by the scanner is being processed and polished from our scanner database to your tenant database. During this process the cookies recorded are compared to our Cookiepedia data base for categorization. If a prior categorization is not found, the cookie will be categorized in the "Unknown" category.

    Note

    If you attempt to reprocess data that is still in Processing Data, you will receive a notification. Please allow more time for the reprocess to be completed.

    ReprocInProg.png
  4. Completed

    The scan has completed when in this state, you can click into the domain and view the scan results. For more information on viewing scan results, see Viewing Scan Results.

  5. Failed Login

    Authentication of the scanner failed. Check your login credentials and configuration. For more information on how to configure login, see Scanning Behind Authentication.

  6. Scan Error

    This status indicates the scan has completed, but there was an issue migrating the data.

To add a website to the scanner

  1. On the Cookie Consent menu, select Websites. The Websites screen appears. 

  2. Click the Add Website button on the top right of the screen. The Scan Website screen appears.

  3. Complete the fields. For more information, see Add Website Screen Reference.

  4. Click the Scan and Configure button.

    Note

    You can click the Scan Only button to scan your website without configuring a container. See Add Website Screen Reference.

  5. Select an Experience Kit (geolocation rule group) to which you want to assign the website.

    For information on configuring custom Experience Kits, see To configure a custom Experience Kit.

  6. Click the Scan button. The Review Configurations screen appears.

    review_config_scan_only.png
  7. Review your Banner and Preference Center configurations.

  8. Click the Confirm button.

  9. When the website scan is finished, the status will display as Completed.

Add Website Screen Reference

add_new_website.png

Field

Description

Website URL

Enter the URL of the site you want to scan. The depth of the scan will depend on how the URL is entered. The domain entered will be top level domain scanned with the subdomains being scanned depending on how the URL is entered as described below.

Below, fid examples of how each site would be scanned depending on the URL entered when scanning:

URL Entered

Scanning Result

www.example.com

Only scan the root domain.

Note

Note: this scan would only scan pages that include www.example.com like www.example.com/blog but it would not scan blog.example.com as it does not include the root www.example.com

example.com

Scan the root domain and all subdomains.

Note

This is the OneTrust recommendation for scanning domains Note: this scan will include pages, paths and sub-domains that include example.com. For instance, it will include abc.example.com, xyz.example.com; however, it will not include abc.example.de.

www.example.com/sub

Scan only the subdomain (www.example.com).

example.com/sub

Scan the subdomain (example.com) and all lower domains.

Note

The scanner text field has a character limit of 1024 characters.

Organization

Select the organization responsible for the website.

Note

Users who have access to this organization will be able to view the scan.

Geolocation

Select the location from which you want the scan to originate.

geolocation_scan_field.png

By default, OneTrust scans the website from Europe. If your website is not accessible from Europe, please whitlelist our IP address. For more information on OneTrust IP addresses, see About OneTrust Hosting Options, Backup, and Locations.

Limit scan to [number] pages

Enter the number of pages to which the scan should be limited.

OneTrust recommends limiting scans to 2,000-3,000 pages.

Note

The scanner can scan up to 15,000 pages. OneTrust does not recommend scanning all 15,000 pages. However, if this is required, OneTrust recommends splitting pages with sitemap URLs and scanning them separately.

Slow Scan

Increases the timeout values on scanned pages to detect "lazy loading" cookies.

Note

To enable the Slow Scan setting, contact OneTrust Support.

Limit to this path within site

Enable this setting to limit the scan to a certain path within the domain.

Note

In the Website URL field, enter the URL using the format domain.com/path/.

Enable Unique User Agent

Enable this setting to set the scanning agent to OneTrustBot.

Note

When you disable this toggle and scan a website, this displays as a visitor on the Dashboard. This is applicable when you already have OneTrust scripts on the site and are performing a rescan.

To whitelist the OneTrust user agent: 

Mozilla/5.0 (X11; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0;OneTrustBot;

Note

You have to whitelist the OneTrust user agent if bot detection is present on the site that could block OneTrust from accessing the site.

Scan Pages with Query Parameters

If you want to limit the scan to pages with certain query parameters (which appear at the end of a url in the format ?parameter=value), enter the parameters separated by commas.

Target Pages to Scan

If you want to include, exclude, or target certain pages in the scan, configure the fields and enter the URLs for the pages or paths.

target_pages.png

Field

Description

Page List Name

Enter a name for the list of pages you want to scan. You can configure up to five settings per scan.

Include / Exclude / Target

Select an option to include only, exclude, or target specific pages in your scan.

  • Include: scan only the pages, paths, or subdomains you specify.

  • Exclude: scan your domain except for the pages, paths, or subdomains you specify.

  • Target: scan the pages, paths, or subdomains you specify first before scanning the rest of your domain.

Page / Path / Subdomain

Select an option to define the scope of your inclusion, exclusion, or targeting.

  • Page: limit the scan to specific pages on your domain.

  • Path: limit the scan to all the pages on specific paths within your domain.

  • Subdomain: limit the scan to specific subdomains within your domain.

URLs

Enter the exact URLs for the pages, paths, or subdomains, with each one on its own line.

Note

This field has a limit of 100 domains or 6000 characters, whichever comes first.

Sitemaps URLs

If you want to scan a site in a particular pattern or scan pages that may not be easily accessible through the website interface, enter the URIs for hosted XML sitemaps, each on its own line.

To configure a custom Experience Kit

When scanning a domain, if you click to Scan and Configure ,you will be prompted to create an Experience Kit after adding a website.

  1. Select Create Custom Experience Kit. The Assign to Container screen appears.

    custom_kit.png
  2. On the Assign to Container screen, click the Create Custom Experience Kit button. The Select Audience screen appears.

    select_framework.png
  3. Enter a name and select the framework(s) for your Experience Kit.

    Configure additional languages by clicking the Manage Languages button.

  4. Click the Next button. The Choose a Cookie Banner Layout screen appears.

    banner_layout_config.png
  5. Configure your banner layout and branding.

  6. Click the Next button. The Choose a Preference Center Layout screen appears.

    pc_layout_config.png
  7. Configure your Preference Center layout and branding.

  8. Click the Next button. The Review Configurations screen appears.

    review_scan_config.png
  9. Review your configurations and make changes as needed.

  10. Click the Confirm button.

To re-scan an existing website

  1. On the Cookie Consent menu, select Websites. The Websites screen appears.

  2. Hover over the row for the website which you want to re-scan until the Context Menu icon context_menu_icon_v2.png appears. 

  3. Click the Context Menu icon. The Context menu appears.

  4. Select Re-scan. The Re-audit modal appears.

    rescan_modal.png
  5. Complete the fields. For more information, see Add Website Screen Reference.

  6. Click the Start Scan button.

To reassign a website to a different organization

  1. On the Cookie Consent menu, select Websites.  The Websites screen appears.

  2. Hover over the row for a website you want to reassign until the Context Menu icon appears. 

  3. Click the Context Menu icon. The Context menu appears.

  4. Select Reassign. The Reassign Organization modal appears.

    reassign_org_scan.png
  5. In the Organization field, select the organization to which you want to assign the website.

  6. Click the Reassign button.

To schedule a website scan

Note

Scheduled scans are limited to 20 per day unless overridden at the tenant level. Contact OneTrust Support or your account representative to override the default scan limit.

If there are leftover scans scheduled, they will be picked up the following day.

Example:

Day 1: 30 scans scheduled. 20 scans run.

Day 2: 10 scans scheduled. 10 day-2 scans are run, plus 10 leftover day-1 scans.

  1. On the Cookie Consent menu, select Websites.  The Websites screen appears.

  2. Hover over the row for the website for which you want to schedule a scan until the Context Menu icon appears. 

  3. Click the Context Menu icon. The Context menu appears.

  4. Select Schedule. The Schedule Scan modal appears.

    schedule_scan.png
  5. Configure the fields.

    Field

    Description

    Scan frequency (in months)

    Select the number of months to determine scan frequency.

    Next Scan

    Select a date for the next scan.

    Geo-location

    Select a scanner location.

  6. Click the Save button.

Note

You can schedule scans for your domains based on new content being added and maintaining your Cookie Consent implementation. Read more here.

To stop a website scan

  1. On the Cookie Consent menu, select Websites.  The Websites screen appears.

  2. Hover over the row for an In Progress website scan you want to stop until the Context Menu icon appears.

  3. Click the Context Menu icon. The Context menu appears.

  4. Select Stop. A confirmation modal appears.

  5. Click the Confirm button.

Note

If a scan is Scanning, the scan will stop, and the results will be available as the latest scan result.

If a scan is Pending and there are no previous scans completed on the site, the scan is removed.

If a scan is Pending and there is a history of one or more successful scans, then only the new scan is removed from the pending list, and previous reports will be available as before.

To view a website's activity history

In Cookie Consent, you can access an audit log of activity for each of your domains to review any changes made over time.

  1. On the Cookie Consent menu, select Websites. The Websites list screen appears.

  2. Select a website from the list. The Website Details screen appears.

  3. Go to the Activity tab. The Activity screen appears.

  4. Review the website's activity history.

To export website data

  1. On the Cookie Consent menu, select Websites. The Websites list screen appears.

  2. Click the Column Selector icon. The Website Column Selector modal appears.

    website_column_selector.png
  3. Use the arrow keys to configure column visibility for the Websites list screen. This configuration will be reflected in the export.

    Note

    Column sorting will also be reflected in the export file. Click the header of a column to sort the list screen in ascending or descending order of that column's data.

  4. Click the Save button.

  5. Click the Export button in the header. A confirmation modal appears.

  6. Click the Confirm button.

  7. Click the Notifications icon Notification_ICon.PNG. A website data export appears in the Notifications popover.

    data_export_notif.png
  8. Click Download. An excel report downloads to your device.

  9. View the data export.

To delete a website from the scanner

  1. On the Cookie Consent menu, select Websites.  The Websites screen appears.

  2. Hover over the row for a website you want to delete until the Context Menu icon appears.

  3. Click the Context Menu icon. The Context menu appears.

  4. Select Delete. A confirmation modal appears.

  5. Select the data you want to delete.

  6. Click the Confirm button.

Cookie Scanner IP Addresses

Follow the links to find the current scanner IP addresses.

OneTrust

CookiePro

Frequently Asked Questions

1.

What should I do when the scanner returns zero pages and zero cookies?

  • Check if the website is available via FireFox in the region that the scanner is located. For more information, see About OneTrust Hosting Options, Backup, and Locations. Whitelisting the scanner’s IP address will help if the website is not available in the scanner’s region.

  • If the website is behind a firewall, that will require whitelisting scanner’s IP address.

  • Check if the website has a re-direct. If this re-direct is to a domain other than the one entered for scanning, the scanner will not yield expected results. For instance, if you are trying to scan example.com but it is automatically re-directed to xyz.com, then the scanner will return zero pages and zero cookies and you would have to scan xyz.com.

If all the above fail or are not applicable, OneTrust recommends adding a few pages in Target Pages to Scan. This can direct the scanner to visit all the pages mentioned in the scan configuration.

2.

How do I scan pages that are behind authentication?

See see Scanning Behind Authenticationfor information on configuring a scan to pick up pages behind login.

3.

Why does the scan return different results with every re-scan?

Multiple scans of the same website can produce varying outcomes as the scanner may navigate different paths throughout the site. For instance, in a case where a page becomes unavailable during a scan due to load timeout, and it includes links to other pages that lead to additional links branching out to other pages, there is a possibility that this cluster of pages may not get scanned.

4.

How long does it take for the scanner to complete a website scan?

Scan time varies from a couple hours to several days. Scan time depends on several factors including total number of pages as well as number of scans launched at the same time in the tenant.

5.

How do I scan my website if it has more than 15,000 pages?

You can scan your domain in parts by initiating multiple scans of the same website. Each time, add a few sub-domains in the Target Pages to Scan section of the scan configuration. You can also add a few sitemaps each time in the Sitemap URL section.

6.

Is OneTrust able to perform user actions?

No, our scanner does not have the ability to perform any user actions like checking boxes, accepting existing banners, submitting forms, etc.

7.

What is the difference between the Pages and Total Pages columns on the Websites screen?

Pages is the number of pages scanned for that particular scan.

Total Pages is the sum of pages for all scans ever run for that domain.

 
Article Visibility
108,753
Translation
English
Checked

Powered by