Skip to content

Notes

This page explains certain aspects of the code.

Optimizations

Imports outside top level

import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.

From https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statement_Overhead

Because of this, I've added imports in each function that needs them instead of adding them at the top. since the user is only gonna run one single command the imports required for other commands don't need to be imported. Doing this saves around 500ms.

I compile the regex because it's almost twice as fast.

>>> timeit.timeit(not_compiled, number=5)
6.0103950999999824
>>> timeit.timeit(compiled, number=5)
3.9251674000000207
Code

Source code for non compiled regex:

1
2
3
def no_compile():
    for i in range(1000000):
        re.match(r".+", secrets.token_hex(32))

Source code for compiled regex:

1
2
3
4
def no_compile():
    r = re.compile(r".+")
    for i in range(1000000):
        r.match(secrets.token_hex(32))

Speedups

You may have seen that you can do pip install "pypi-command-line[speedups]" and you may have wondered what that actually does. If you see the source code you can see these lines

79
80
81
82
83
84
85
[options.extras_require]
speedups =
    lxml
    rapidfuzz
    requests_cache
    shellingham
    ujson

Here's a detailed explanation of what each of those packages do

shellingham

This does not really speed up anything but adds auto shell detection for autocomplete installation using the --install-completion command. This technically lowers the amount of time taken to install autocompletion because it makes it so that you don't have to manually provide what terminal/shell you're using

lxml

This is an alternative for the built-in html.parser that comes with python. There is also html5lib

Table with parser comparisons supported by BeautifulSoup
Parser Typical usage Advantages Disadvantages
Python's html.parser BeautifulSoup(markup, "html.parser")
  • Batteries included
  • Decent speed
  • Lenient (As of Python 2.7.3 & 3.2.)
  • Not as fast as lxml, less lenient than html5lib.
lxml's HTML parser BeautifulSoup(markup, "lxml")
  • Very fast
  • Lenient
  • External C dependency
lxml's XML parser BeautifulSoup(markup, "lxml-xml") BeautifulSoup(markup, "xml")
  • Very fast
  • The only currently supported XML parser
  • External C dependency
html5lib BeautifulSoup(markup, "html5lib")
  • Extremely lenient
  • Parses pages the same way a web browser does
  • Creates valid HTML5
  • Very slow
  • External Python dependency

- From https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

If this is not installed it uses html.parser for html and xml.etree.ElementTree for xml.

  • For XML

    lxml is more than 10 times faster than xml.etree.ElementTree[]

  • For HTML

    lxml is more than 5 times faster than html.parser []

requests-cache

This allows http requests to be faster by caching[?] them

rapidfuzz

This allows rapid fuzzy string matching around 20 times faster than thefuzz

ujson

This allows json parsing to be ultra fast

Dependencies

Rich

This allows the beautiful formatting that you can see when using the commands. Colors, Tables, Panels, Progress Bars and much more.

Typer

This allows the command line interface to work properly. Although it's just a wrapper over click and the main stuff is done by click. typer adds some extra features over click such as autocompletion. That's why I've chosen typer over click.

Requests

For HTTP web requests

Questionary

For cool prompts.

Humanize

For human readable data

bs4 (BeautifulSoup)

For parsing html and xml data

TheFuzz

For fuzzy string matching

Rich RST

For pretty-printing reStructuredText descriptions

Packaging

For version parsing

Wheel Filename

For parsing wheel filenames

Dependency Installation Notes

Name Applicable Commands Note
typer meta
rich all
requests meta
humanize all
bs4 search, new-packages (optional), new-releases (optional) If lxml and this both are installed then this is used for new-packages and new-releases and can provide up to a 5x improvement in speed and avoid a bug where the descriptions are not shown.
questionary rtfd, browse, description (optional) For the description command, if the pypi page doesn't have a descripton, if it mentions a single github repository then this is not required but if it mentions multiple github repos then you'll need to select one therefore this will be required
thefuzz (optional) meta (command suggestions when an invalid command is used) If this is not available then it uses difflib
rich-rst description (sometimes) This will only be required in the description command if the description is in reStructuredText
shellingham (optional) --install-completion This is only required to automatically detect the terminal when installing autocomplete for the current shell. without this you'd have to manually pass the shell as an argument to install-completion
lxml (optional) search, new-packages (optional), new-releases (optional) For the search command this is a must have. for the new-packages and new-releases command this can provide up to 5x speed improvement and avoid a bug where the descriptions are not shown.
requests-cache (optional) all
rapidfuzz (optional) meta (command suggestions when an invalid command is used) If this is not available then it tries to use thefuzz and if both are not installed it tries to use difflib
packaging wheels, information (optional) In the information command If this is not available then it uses distutils which is buggy at times
wheel-filename wheels (optional) If the --supported-only flag is passed then this is required

Cache

The library caches packages and refreshes them per day. it also caches web requests and responses if requests-cache is installed

For the packages cache it makes it so that the data doesn't need to be re-downloaded in the regex-search command. The data is around 2.75 mb (gzipped) when it's gotten from the web, and when it's stored locally as cache it takes around 5 mb.

For the web requests cache it stores the recent few web requests so that if you are using the same command again it doesn't load the data again. This cache is automatically shortened by removing specific urls from the cache after they have expired depending on the url. Currently these are the cache expiry durations for commands

Command Duration
browse 3 hours
description 3 hours if gotten from PyPI and 1 day if gotten from GitHub
information 3 hours
largest-files 1 day
new-packages 1 minute
new-releases 1 minute
new-releases 3 hours
regex-search 1 day
releases 3 hours
rtfd No cache needed
wheels 3 hours
Back to top