Skip to content

Duplicates are not removed from JSON files of content blockers #108

@Alex-302

Description

@Alex-302

Since other filters may include parts of the main filters, or independently have rules that already exist in the main filters, the CB size may be larger than necessary. Because of this, necessary rules may be discarded in order to fit within the acceptable CB JSON size.

Actual result

Converted JSON contains duplicates of rules. For example, when enabled Base Filter + Easylist, JSON contains 25k duplicates.

Total rules: 102 603
Unique rules: 76 668
Duplicates: 25 935

Expected result

JSON does not contain duplicates.

Duplicate example:

Details
    {
        "trigger": {
            "url-filter": ".*",
            "unless-domain": [
                "*memo.wiki",
                "*addchannel.net",
                "*beasoku.com",
                "*blog.housinkai.com",
                "*kakenhi.net",
                "*seesaa.net"
            ]
        },
        "action": {
            "type": "css-display-none",
            "selector": ".interstitial-ad"
        }
    },

Current General CB JSON.
cb_general.zip

Proposed solution

Before compilation, filters of the same content blocker should be merged into one file, and cleaned of duplicates (taking into account domain lists).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions