Notes¶

This page explains certain aspects of the code.

Optimizations¶

Imports outside top level¶

import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.

From https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statement_Overhead

Because of this, I've added imports in each function that needs them instead of adding them at the top. since the user is only gonna run one single command the imports required for other commands don't need to be imported. Doing this saves around 500ms.

Compiling regex in `regex-search`¶

I compile the regex because it's almost twice as fast.

>>> timeit.timeit(not_compiled, number=5)
6.0103950999999824
>>> timeit.timeit(compiled, number=5)
3.9251674000000207

Code

Source code for non compiled regex:

1
2
3

def no_compile():
    for i in range(1000000):
        re.match(r".+", secrets.token_hex(32))

Source code for compiled regex:

def no_compile():
    r = re.compile(r".+")
    for i in range(1000000):
        r.match(secrets.token_hex(32))

Speedups¶

You may have seen that you can do pip install "pypi-command-line[speedups]" and you may have wondered what that actually does. If you see the source code you can see these lines

[options.extras_require]
speedups =
    lxml
    rapidfuzz
    requests_cache
    shellingham
    ujson

Here's a detailed explanation of what each of those packages do

shellingham¶

This does not really speed up anything but adds auto shell detection for autocomplete installation using the --install-completion command. This technically lowers the amount of time taken to install autocompletion because it makes it so that you don't have to manually provide what terminal/shell you're using

lxml¶

This is an alternative for the built-in html.parser that comes with python. There is also html5lib

Table with parser comparisons supported by BeautifulSoup

Parser	Typical usage	Advantages	Disadvantages
Python's html.parser	`BeautifulSoup(markup, "html.parser")`	Batteries included Decent speed Lenient (As of Python 2.7.3 & 3.2.)	Not as fast as lxml, less lenient than html5lib.
lxml's HTML parser	`BeautifulSoup(markup, "lxml")`	Very fast Lenient	External C dependency
lxml's XML parser	`BeautifulSoup(markup, "lxml-xml")` `BeautifulSoup(markup, "xml")`	Very fast The only currently supported XML parser	External C dependency
html5lib	`BeautifulSoup(markup, "html5lib")`	Extremely lenient Parses pages the same way a web browser does Creates valid HTML5	Very slow External Python dependency

- From https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

If this is not installed it uses html.parser for html and xml.etree.ElementTree for xml.

For XML

lxml is more than 10 times faster than xml.etree.ElementTree^[‾]
For HTML

lxml is more than 5 times faster than html.parser ^[‾]

requests-cache¶

This allows http requests to be faster by caching^[?] them

rapidfuzz¶

This allows rapid fuzzy string matching around 20 times faster than thefuzz

ujson¶

This allows json parsing to be ultra fast

Dependencies¶

Rich ¶

This allows the beautiful formatting that you can see when using the commands. Colors, Tables, Panels, Progress Bars and much more.

Typer ¶

This allows the command line interface to work properly. Although it's just a wrapper over click and the main stuff is done by click. typer adds some extra features over click such as autocompletion. That's why I've chosen typer over click.

Requests ¶

For HTTP web requests

Questionary ¶

For cool prompts.

Humanize ¶

For human readable data

bs4 (BeautifulSoup)¶

For parsing html and xml data

TheFuzz ¶

For fuzzy string matching

Rich RST ¶

For pretty-printing reStructuredText descriptions

Packaging ¶

For version parsing

Wheel Filename ¶

For parsing wheel filenames

Dependency Installation Notes¶

Name	Applicable Commands	Note
typer	meta
rich	all
requests	meta
humanize	all
bs4	search, new-packages (optional), new-releases (optional)	If lxml and this both are installed then this is used for new-packages and new-releases and can provide up to a 5x improvement in speed and avoid a bug where the descriptions are not shown.
questionary	rtfd, browse, description (optional)	For the description command, if the pypi page doesn't have a descripton, if it mentions a single github repository then this is not required but if it mentions multiple github repos then you'll need to select one therefore this will be required
thefuzz (optional)	meta (command suggestions when an invalid command is used)	If this is not available then it uses difflib
rich-rst	description (sometimes)	This will only be required in the `description` command if the description is in reStructuredText
shellingham (optional)	--install-completion	This is only required to automatically detect the terminal when installing autocomplete for the current shell. without this you'd have to manually pass the shell as an argument to `install-completion`
lxml (optional)	search, new-packages (optional), new-releases (optional)	For the search command this is a must have. for the new-packages and new-releases command this can provide up to 5x speed improvement and avoid a bug where the descriptions are not shown.
requests-cache (optional)	all
rapidfuzz (optional)	meta (command suggestions when an invalid command is used)	If this is not available then it tries to use thefuzz and if both are not installed it tries to use difflib
packaging	wheels, information (optional)	In the information command If this is not available then it uses distutils which is buggy at times
wheel-filename	wheels (optional)	If the `--supported-only` flag is passed then this is required

Cache¶

The library caches packages and refreshes them per day. it also caches web requests and responses if requests-cache is installed

For the packages cache it makes it so that the data doesn't need to be re-downloaded in the regex-search command. The data is around 2.75 mb (gzipped) when it's gotten from the web, and when it's stored locally as cache it takes around 5 mb.

For the web requests cache it stores the recent few web requests so that if you are using the same command again it doesn't load the data again. This cache is automatically shortened by removing specific urls from the cache after they have expired depending on the url. Currently these are the cache expiry durations for commands

Command	Duration
browse	3 hours
description	3 hours if gotten from PyPI and 1 day if gotten from GitHub
information	3 hours
largest-files	1 day
new-packages	1 minute
new-releases	1 minute
new-releases	3 hours
regex-search	1 day
releases	3 hours
rtfd	No cache needed
wheels	3 hours