Notes¶
This page explains certain aspects of the code.
Optimizations¶
Imports outside top level¶
import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.
From https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statement_Overhead
Because of this, I've added imports in each function that needs them instead of adding them at the top. since the user is only gonna run one single command the imports required for other commands don't need to be imported. Doing this saves around 500ms.
Compiling regex in regex-search
¶
I compile the regex because it's almost twice as fast.
>>> timeit.timeit(not_compiled, number=5)
6.0103950999999824
>>> timeit.timeit(compiled, number=5)
3.9251674000000207
Code
Source code for non compiled regex:
1 2 3 |
|
Source code for compiled regex:
1 2 3 4 |
|
Speedups¶
You may have seen that you can do pip install "pypi-command-line[speedups]"
and you may have wondered what that actually does.
If you see the source code you can see these lines
79 80 81 82 83 84 85 |
|
Here's a detailed explanation of what each of those packages do
shellingham¶
This does not really speed up anything but adds auto shell detection for autocomplete installation using the --install-completion
command. This technically lowers the amount of time taken to install autocompletion because it makes it so that you don't have to manually provide what terminal/shell you're using
lxml¶
This is an alternative for the built-in html.parser that comes with python. There is also html5lib
Table with parser comparisons supported by BeautifulSoup
Parser | Typical usage | Advantages | Disadvantages |
---|---|---|---|
Python's html.parser | BeautifulSoup(markup, "html.parser") |
|
|
lxml's HTML parser | BeautifulSoup(markup, "lxml") |
|
|
lxml's XML parser | BeautifulSoup(markup, "lxml-xml") BeautifulSoup(markup, "xml") |
|
|
html5lib | BeautifulSoup(markup, "html5lib") |
|
|
- From https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser
If this is not installed it uses html.parser for html and xml.etree.ElementTree for xml.
-
For XML
lxml
is more than 10 times fasterthan xml.etree.ElementTree
[‾] -
For HTML
lxml
is more than 5 times faster thanhtml.parser
[‾]
requests-cache¶
This allows http requests to be faster by caching[?] them
rapidfuzz¶
This allows rapid fuzzy string matching around 20 times faster than thefuzz
ujson¶
This allows json parsing to be ultra fast
Dependencies¶
Rich¶
This allows the beautiful formatting that you can see when using the commands. Colors, Tables, Panels, Progress Bars and much more.
Typer¶
This allows the command line interface to work properly. Although it's just a wrapper over click
and the main stuff is done by click. typer adds some extra features over click such as autocompletion. That's why I've chosen typer over click.
Requests¶
For HTTP web requests
Questionary¶
For cool prompts.
Humanize¶
For human readable data
bs4 (BeautifulSoup)¶
For parsing html and xml data
TheFuzz¶
For fuzzy string matching
Rich RST¶
For pretty-printing reStructuredText descriptions
Packaging¶
For version parsing
Wheel Filename¶
For parsing wheel filenames
Dependency Installation Notes¶
Name | Applicable Commands | Note |
---|---|---|
typer | meta | |
rich | all | |
requests | meta | |
humanize | all | |
bs4 | search, new-packages (optional), new-releases (optional) | If lxml and this both are installed then this is used for new-packages and new-releases and can provide up to a 5x improvement in speed and avoid a bug where the descriptions are not shown. |
questionary | rtfd, browse, description (optional) | For the description command, if the pypi page doesn't have a descripton, if it mentions a single github repository then this is not required but if it mentions multiple github repos then you'll need to select one therefore this will be required |
thefuzz (optional) | meta (command suggestions when an invalid command is used) | If this is not available then it uses difflib |
rich-rst | description (sometimes) | This will only be required in the description command if the description is in reStructuredText |
shellingham (optional) | --install-completion | This is only required to automatically detect the terminal when installing autocomplete for the current shell. without this you'd have to manually pass the shell as an argument to install-completion |
lxml (optional) | search, new-packages (optional), new-releases (optional) | For the search command this is a must have. for the new-packages and new-releases command this can provide up to 5x speed improvement and avoid a bug where the descriptions are not shown. |
requests-cache (optional) | all | |
rapidfuzz (optional) | meta (command suggestions when an invalid command is used) | If this is not available then it tries to use thefuzz and if both are not installed it tries to use difflib |
packaging | wheels, information (optional) | In the information command If this is not available then it uses distutils which is buggy at times |
wheel-filename | wheels (optional) | If the --supported-only flag is passed then this is required |
Cache¶
The library caches packages and refreshes them per day. it also caches web requests and responses if requests-cache
is installed
For the packages cache it makes it so that the data doesn't need to be re-downloaded in the regex-search
command. The data is around 2.75 mb (gzipped) when it's gotten from the web, and when it's stored locally as cache it takes around 5 mb.
For the web requests cache it stores the recent few web requests so that if you are using the same command again it doesn't load the data again. This cache is automatically shortened by removing specific urls from the cache after they have expired depending on the url. Currently these are the cache expiry durations for commands
Command | Duration |
---|---|
browse | 3 hours |
description | 3 hours if gotten from PyPI and 1 day if gotten from GitHub |
information | 3 hours |
largest-files | 1 day |
new-packages | 1 minute |
new-releases | 1 minute |
new-releases | 3 hours |
regex-search | 1 day |
releases | 3 hours |
rtfd | No cache needed |
wheels | 3 hours |