Popularity

1.5

Growing

Activity

7.4

Stars 64

Watchers 3

Forks 7

Last Commit 10 days ago

Description

Avoid loosing bandwidth capacity and processing time for webpages which are probably not worth the effort. This library provides an additional brain for web crawling, scraping and management of Internet archives. Specific fonctionality for crawlers: stay away from pages with little text content or target synoptic pages explicitly to gather links.

This navigation help targets text-based documents (i.e. currently web pages expected to be in HTML format) and tries to guess the language of pages to allow for language-focused collection. Additional functions include straightforward domain name extraction and URL sampling.

Programming language: Python

License: Apache License 2.0

Tags: Natural Language Processing URL Manipulation WWW Validation

Latest version: v0.6.0

coURLan alternatives and similar packages

Based on the "URL Manipulation" category.
Alternatively, view courlan alternatives based on common mentions on social networks and blogs.

furl

6.4 0.0 L2 coURLan VS furl

🌐 URL parsing and manipulation made easy.
webargs

5.1 8.7 L5 coURLan VS webargs

A friendly library for parsing HTTP request arguments, with built-in support for popular web frameworks, including Flask, Django, Bottle, Tornado, Pyramid, webapp2, Falcon, and aiohttp.

WorkOS - The modern identity platform for B2B SaaS

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

Promo workos.com

yarl

5.0 9.4 coURLan VS yarl

Yet another URL library
pyshorteners

3.1 0.0 L5 coURLan VS pyshorteners

:electric_plug: Generating short urls with python has never been easier
purl

2.9 0.0 L5 coURLan VS purl

A simple, immutable URL class with a clean API for interrogation and manipulation.
short_url

2.5 0.0 L5 coURLan VS short_url

Python implementation for generating Tiny URL- and bit.ly-like URLs.
URL Cleaner

0.7 10.0 coURLan VS URL Cleaner

A package for removing tracing parameters from URLs. This package supports automatically updating filtering rules from Adguard.