Can websites see when you download files.
Executive summary
Yes—websites can often detect that a file was downloaded, but how reliably and what they learn varies: server logs and analytics capture many download events (especially link clicks routed through the site or tracked with JavaScript), while purely offline saves or direct file access without any tracking hooks are much harder for the origin to observe [1] [2] [3].
1. How downloads normally reveal themselves: server logs and HTTP requests
Every time a browser or client fetches a file from a web server it generates an HTTP request that is recorded in server access logs (IP, timestamp, requested path, user‑agent), so site operators can see raw download activity if they control the server or the hosting logs [1].
2. JavaScript, analytics and enhanced measurement make visibility granular
If the page includes analytics scripts or click handlers (Google Analytics/GTM, Plausible, Amplitude, etc.), those scripts can send explicit “file_download” events, tying link clicks and parameters to richer session data and making downloads visible in dashboards and custom reports [2] [4] [5].
3. What can’t be tracked easily: direct opens and offline saves
Files that are opened directly from an external URL (e.g., a PDF linked elsewhere and opened without a referring page) or content saved and then viewed offline won’t automatically execute site JavaScript, so analytics systems that rely on page‑embedded scripts won’t see those uses; tracking such direct opens is “conveniently” difficult and often impossible without server‑side instrumentation [5] [3].
4. Workarounds operators use to force tracking: proxies, redirects, and instrumented downloads
Sites that want to guarantee visibility often route downloads through a script or redirect that logs the event (for example, mapping file extensions to a script on the server or using a download endpoint that records the click before delivering the file), or they append tracking parameters to URLs so access can be matched in logs [1] [6].
5. Limits and blind spots: file formats and external hosting
Some file types (like PDFs) can’t contain JavaScript that reports back, so once a PDF is served the origin can only rely on the server log or the referring page’s scripts; files hosted on third‑party CDNs, archives, or proxy caches (including the Wayback Machine) may be retrieved without the origin seeing the final client request, reducing direct visibility [5] [7].
6. Privacy implications and practical defenses
Because downloads can be correlated with IPs, timestamps, and session identifiers, site owners can often infer who downloaded what when they control the logging or instrumentation; conversely, using proxies, archives, VPNs, or fetching content offline breaks or obscures those signals but does not magically erase every trace—server logs, intermediate caches, and bespoke tracking routes still create potential telemetry [7] [8].
7. Bottom line for the consumer and the operator
For operators: instrumented pages plus server‑side logging and redirecting downloads produce reliable visibility into downloads; for consumers: simple page saves and offline viewing reduce what the origin can see, but direct requests to the origin server or clicking tracked links will almost always be observable by the site [1] [3] [2]. Where sources do not describe every edge case, reporting is limited to documented mechanisms and common industry practices [7] [5].