This article was last updated on July 02, 2024, to add new advantages and hard links sections for pnpm
Introduction
A package manager is responsible for installing, updating, and removing software packages and dependencies. NPM has been widely used as the standard package manager for Javascript; however, companies are quickly adopting the pnpm package manager due to its immense benefits.
Steps we'll cover:
- What is pnpm?
- Why not npm or yarn?
- How does pnpm solve these problems?
- Advantages of pnpm
- pnpm CLI
- Migrating from NPM/Yarn to PNPM
- Advanced Features in pnpm
What is pnpm?
pnpm is a drop-in replacement for npm. It is built on top of npm and is much faster and more efficient than its predecessor. It is highly disk efficient and solves inherent issues in npm. In this article, we will discuss pnpm in detail. We will explain how it works and will go through why pnpm is a perfect replacement for npm or yarn.
Let’s start with discussing the issues in existing package managers.
Why not npm or yarn?
Npm uses flattened node_modules
. The flattened dependency results in many issues, such as:
- The algorithm for flattening a dependency tree is complex
- Some of the packages have to be duplicated inside another project’s
node_modules
folder - Modules have access to packages they do not depend on
Let’s explain this with a basic example. One of your projects needs an express module, and express has a dependency package named “debug”. This is how the structure of node modules will look like:
Your code can require('debug')
, even if you do not depend on it explicitly in the package.json
file. Let’s discuss two scenarios that can cause issues:
- Express updates its debug dependency with breaking changes.
- Express does not depend on debug anymore.
In both cases, your code will fail because it has an implicit dependency to debug. That is a big problem. Similarly, if you work with a large monorepo, then it will be much more difficult to track particular dependencies a project uses. Duplicate packages are another issue here. Now yarn is slightly better in terms of optimizing disk space as it makes use of hoistings; however that approach fails in some cases.
How does pnpm solve these problems?
pnpm uses hard links and symlinks to maintain a semistrict node_modules
structure. So what’s the difference between a hard link and a soft link? Well, a hard link is a different reference to the same file. In soft link, you create a new file, and the contents of the file point to another path. Symlinks are symbolic links that pnpm uses to create a nested structure of dependencies.
Let’s see the pnpm version of the express and node modules discussed in the previous section.
The above picture shows the directory structure of your project files. The first thing you will notice is that your code cannot access the “debug” package because it is not directly under the root level node_modules
directory. Pnpm has created a special folder named “.pnpm” that contains hard links to all the modules.
If you look at express/4.0.0
, the express
module is a hard link to the global pnpm-store
and a debug
symbolic link to the debug hard link that also links to the global pnpm-store.
Below is the image of the actual pnpm-store, which contains the packages. This is where all the downloaded dependencies are maintained. When you download a dependency, pnpm first checks whether the dependency is present in this store or not. If it finds the dependency in the store, pnpm fetches it by creating a hard link.
Because of this approach, pnpm re-uses the same packages if they are already installed for another project. This brings us to the many benefits which pnpm provides. Let’s discuss some of those.
## Understand how pnpm deals with hard links and symlinks
I will try to describe pnpm's ways of using hard links and symlinks to give us a basic understanding of how our node_modules
directory structure will be.
Hard Links and Symbolic Links
How pnpm Uses Hard Links and Symlinks
Hard Links vs. Soft Links (Symlinks)
- Hard Links: A hard link refers directly to physical data on the disk. Multiple hard links to one file refer to the same data; it is the same inode. So, any changes made to the file via any hard link are reflected in all hard links.
- Soft Links (Symlinks): A symlink is a pointer to another file or directory path. It is more or less like a shortcut. When the original file is deleted, the symlink becomes broken and doesn't contain the original file's data.
pnpm's Approach:
- pnpm places inside of
node_modules
a.pnpm
directory that includes hard links to all of the module files, held in a global store. What this means is that even if a given module should be utilized by many projects, it will only ever take up one space on disk with pnpm. - Then, in the
node_modules
directory, it makes symlinks to those hard links. This way, each project gets its ownnode_modules
structure, but there is no redundancy in copies of the same packages.
Impact on project structure
Directory Structure
- Though npm and Yarn flatten dependencies in the node_modules directory, pnpm keeps the structure nested. Here is how the structure appears:
node_modules/
.pnpm/
express@4.0.0/node_modules/express/
debug@2.6.9/node_modules/debug/
express/
-> symlink to../.pnpm/express@4.0.0/node_modules/express
debug/
-> symlink to../.pnpm/debug@2.6.9/node_modules/debug
Efficiency:
- This approach makes the usage of disk space very optimal. If several projects share a dependency, pnpm places only one copy in the global store and references it with hard links. This also speeds up the installation because pnpm is capable of quickly linking to the already existing files rather than downloading and extracting them again.
Dependency Isolation
- With this structure, pnpm ensures that dependencies are isolated. A package in your project can only access the dependencies explicitly declared in its
package.json
. This way, the number of conflicts and implicit access issues is brought down to a minimum compared with npm and Yarn.
Hard links and symlinks are being used for leveraging pnpm in optimal disk space and management of dependencies, making it an efficient and effective package manager. If there are any questions or further details needed from this, please do not hesitate to reach out.
Advantages of pnpm
pnpm provides many advantages over npm and yarn. Some of the core features include:
Disk space efficiency
As shown in the previous section, pnpm uses a content-addressable file system to store the packages and dependencies on the disk. That means the same package will not be duplicated. Even with different versions of the same package, pnpm is intelligent enough to reuse the maximum code. If a package version 1 has 500 files and version 2 has just one more file, then pnpm will not write 501 files for version 2; instead, it will create a hard link to the original 500 files and write just the new file. If you compare it with npm, then version 2 will also be loaded with duplicating the original 500 files. For large monorepo projects, it can make a big difference. Image a scenario where a package is needed by hundreds of other packages that will be killing your disk space unless you use pnpm.
Improved speed
The speed of package installation with pnpm is significantly better than npm and yarn. If you look at the below benchmark tests, you can see that pnpm performs better in most cases thano npm and yarn.
Security
Pnpm, like yarn, has a special file with the checksum of all the installed packages. This ensures the integrity of all the installed packages before their code is executed.
In terms of unprivileged access, pnpm also outperforms npm and yarn. In the case of npm and yarn, If package A depends on package B, and B depends on C, then A implicitly gets access to C even though A has not declared C as its dependency. This problem is intensified in a large monorepo setup. Pnpm, on the other hand, uses a different dependency resolution algorithm and different folder structure of node_modules
that prevents illegal access to packages.
Note that pnpm has excellent support for monorepo and offline mode.
pnpm CLI
Pnpm has a cool CLI. Here are some of the basic commands:
pnpm init
: Create a package.json filepnpm install
: Download all the packages listed as dependencies in package.jsonpnpm add [module_name]@[version]
: Download a particular version of a package and add to the list of dependencies in package.jsonpnpm remove [module_name]
: Uninstall and remove a package from the list of dependencies inpackage.json
pnpm list
: Print a tree of locally installed modules
Migrating from NPM/Yarn to PNPM
If your projects use npm or yarn, then migrating to pnpm will not be very difficult. Here is a comparison of commands between npm, yarn, and pnpm.
npm command | Yarn command | pnpm equivalent |
---|---|---|
npm install | yarn | pnpm install |
npm install [pkg] | yarn add [pkg] | pnpm add [pkg] |
npm uninstall [pkg] | yarn remove [pkg] | pnpm remove [pkg] |
npm update | yarn upgrade | pnpm update |
npm list | yarn list | pnpm list |
npm run [scriptName] | yarn [scriptName] | pnpm [scriptName] |
npx [command] | yarn dlx [command] | pnpm dlx [command] |
npm exec | yarn exec [commandName] | pnpm exec [commandName] |
npm init [initializer] | yarn create [initializer] | pnpm create [initializer] |
Advanced Features in pnpm
I would like to share some advanced features of pnpm with you in the hope that we can make our development workflow brilliant. The following are some of the most essential features that demonstrate why pnpm is powerful and efficient:
Workspaces
pnpm Workspaces Introduction Workspaces: pnpm workspaces help one manage multiple packages under a single repository (monorepo). This becomes very useful when working on projects with interdependent packages or microservices.
- How to Use: Define in your
pnpm-workspace.yaml
file apackages
field, listing monorepo paths to the packages.
packages:
- "packages/*"
- "tools/*"
Workspace Benefits:
- Centralized Management: All the dependencies, scripts, and configurations for any package are managed from the root directory.
- Efficient Linking: Local packages are automatically linked to each other. They interoperate in a way that looks almost as if they were published.
Offline Mode
Use pnpm Offline: Offline Mode: pnpm can work entirely in offline mode with the help of its global store. pnpm caches every downloaded package, which makes it possible to install packages without having an Internet connection after the first download.
- How To Enable: Just use pnpm as you always do, and it will automatically pick the cached packages while offline.
CI/CD Pipelines Benefits:
- Speed: Drastically reduces the time taken to install dependencies in CI/CD pipelines as packages are fetched from a local cache.
- Reliability: Should not depend on network availability, and therefore, the failures due to connectivity issues are significantly reduced.
NodeLinker Mode
NodeLinker Mode:
Overview: pnpm provides node-linker
mode, which can be set to one of two modes—hoisted
or isolated
. This is to say, developers will be given control over how pnpm is supposed to link packages.
- Usage: Configure the
node-linker
in.npmrc
.
node-linker=hoisted
Advantages:
- Flexibility: This allows you the flexibility to pick the right linking strategy for the needs of your project. Hoisting may reduce the duplication of packages, whereas in isolated mode, dependency isolation is much more strict.
Hooks and User Scripts
Hooks:
- Overview: pnpm supports lifecycle hooks that allow you to run your scripts at different stages of a package's life cycle.
- How to Use: Define hooks in your
package.json
under thescripts
section.
{
"scripts": {
"preinstall": "echo 'Before install'",
"postinstall": "echo 'After install'"
}
}
Advantages:
- Customization: Tailor the install and build process such that it can be in line with the needs of the project by coming up with custom scripts.
These advanced features will allow pnpm to increase workflow efficiency, decrease build times, and manage dependencies more reliably. Do not hesitate to contact us with questions or more detailed instructions on implementing these features.
Conclusion
Pnpm is “performant” version of npm, hence the name pnpm. This article showed you the issues with existing package managers like npm and yarn. pnpm has brought many improvements, built on top of existing npm features. pnpm has adopted all the good things of npm while removing its weaknesses, making pnpm the best of both worlds.
As shown in the above benchmark results, pnpm has overall performed much better than npm and yarn. No wonder giant tech companies like Vue3, Prism, and Microsoft are quickly adopting pnpm.