The opportunities to get data from websites in a convenient way, and in the desired format, is an interesting topic that deserves a series of articles. Last time, we discussed exporting data as CSV reports. Now, let’s talk about getting data in PDF format and its benefits, as well as the solution we created for our customers in this regard.
The PDF format and what makes it popular
If there is a document format that is ideal for business, it's PDF (Portable Document Format). It’s perfect for cases where it is important to securely transfer documents in their original state.
The word portable in the Portable Document Format (PDF) describes its essence. PDF docs can be viewed on any operating system, from any country, with no need to worry about format, language, font or encoding issues. PDF docs will look exactly the same to any user.
No one can change the docs without the owner’s consent. Moreover, extra security measures like watermarks, passwords, electronic certificates or encryption can be applied. If needed, the ability to add electronic signatures can be added, but the signer will not be able to change anything.
You can make the docs searchable by key phrases, so they are easy to find. They can be magnified by many times without loss of quality, and at the same time they have an advantage in their compact size.
Our solution for getting website’s data as PDF reports
Here comes the most interesting part of the story. We would like to show you in every detail the functionality for exporting data as a PDF that we created for one of our customers, the website of a large platform in the real estate industry.
So let's see how we manage to tame a huge amount of data and convert it to Portable Document Format.
PDF reports work via Search API Solr Based View. A report can be generated on the basis of the results returned by the views (including the filters applied by the user).
A report can be already pre-defined — we call this kind of report a prepackaged report. These reports are preconfigured combinations of filters for the views with a list of necessary information for a certain content type.
Since we use Search API Solr Based View, we can easily add or remove options for the information displayed in the report. For example, records for apartment sales are now displayed in the "table" and "list" formats.
To ensure a timely response to the document generation, we have written our own daemon that monitors the system to track new requests for document generation.
To generate documents, we have a configured service that uses https://wkhtmltopdf.org.
The generation process works as follows: a website user chooses a PDF report template or filters the results, and then makes a request to generate a document. At this time, the daemon receives a request in the rendered html (JSON) format and passes it on to the IS (or image service, which is responsible for the generation). Once the IS server has processed the request on the callback, it returns a link to the already generated file.
In addition, the generation process is accompanied by constant monitoring. The following document generation statuses are available to the user: in progress (the doc is being generated), failed (document generation failed), ready (the doc is generated).
And, of course, safety first — so to make sure the information does not fall into unwanted hands all documents are stored in a private file system.
The ability to generate a report is available to users at any moment, as soon as they need to update the information.
PDF reports can display charts, tiles, tables, photos, maps (images) — covering everything for maximum information value of the document.
Here’s how it works. If you wish, we are ready to create this functionality for generating reports in PDF or another format of your choice. Any website or business deserves a brilliant solution, and we will provide one!