🦀 Learning Rust basics - x509 Certificate Parser

Introduction
This article aim to explain Rust basics and how to create a Rust application. I manage to create a certificate parser because it covers a wide range of knowledge like:
- Control flows ;
- Managing/Parsing I/O ;
- Error handling ;
- and more…
I will take as input certificate issued on the Certificate Transparency (CT) system.
I am currently learning Rust, I am far from being an expert. I’m writing this article to allow me to go deeper into concepts and I’m open to any improvements and/or advice.
zzzep
Why Rust ?
Rust is a modern programming language that prioritizes efficiency, security, and concurrency. The Rust source code is compiled into native machine code. This makes runtime execution very fast even faster than some C code.
Rust includes automatic memory management, avoiding the need for a garbage collector. The compiler prepares memory allocations ahead of time by creating all the necessary instructions for managing memory during compilation. This technique eradicates memory corruption and security vulnerabilities at runtime, resulting in a highly secure language.
I choose Rust to do that x509 Certificate parser because I had a large volume of data. I already made a parser in python but the performance was not good for the amount of data I had to process.
What is a x509 certificate and Certificate Transparency ?
An X.509 certificate is a digital certificate that uses the X.509 standard to authenticate the identity of a person, organization, or device in a computer network. X.509 certificates are widely used in web browsers, email clients, and other applications that use secure communication protocols such as HTTPS, SSL, and TLS.
The Certificate Transparency Project is a system developed by Google and designed to improve the security of digital certificates used in the SSL/TLS encryption protocol. It provides a way to publicly log all issued SSL/TLS certificates, making it easier to detect and prevent fraudulent certificates and Certificate Authority (CA) compromises.
Prerequisites & Rust concepts
Linux, macOS & Windows can be used here. Rust versatility allowing to create applications on multiple systems.
You may use the editor of your choice like Vim, VScode or CLion w/ the Rust plugin.
I will try here to explain some Rust concepts. If you want more Rust basics you can free get the Rust basics lesson from Zero-Point Security: RustForN00bs, this is a great beginning course.
Installing Rust
For Windows environment you can go here and download the rustup-init.exe
file.
For linux, execute the following command to install Rust.
|
|
If Rust is already installed, check for new version using the following command.
|
|
Creating a new Rust project
To create a new Rust application you have to use the cargo command. This will generate a Cargo.toml
, the Rust configuration file, and a main.rs
file.
|
|
From here you can start writing your code into main.rs
. By default, the file contains a Hello World!. You can compile and run the application using cargo run
.
|
|
Rust crates and dependencies
In Rust, like in most languages, you have the possibility to use libraries (called crates in Rust). This allowing us to use code writed by the community. All crates are referenced in the Rust community’s crate registry.
You can add dependencies by using the cargo
command.
|
|
This will add the dependencies directly in the Cargo.toml
file using the last version of the crates.
|
|
You can also add it manually if you want to use a specific version. Then, on your code, you can import dependencies and functions using the use
statement.
|
|
For the example I take the random()
function. When compiling and running we get the following result:
|
|
Rust modules
During the development of the application I will use modules. When, building larger and more complex application, you want to use modules. It allows to separate codes in multiple files and makes it much easier to maintain the code base.
This is where the lib.rs
file comes in. When a user import a library using the use
statement, they are importing the contents of the lib.rs
file. This means that any modules, structs, functions, or other code that is defined in the lib.rs
file and marked as pub will be available to the user of the library.
Let’s pretend you need to use several time an utility function, for example a function that print something. Let’s write the function into a utils.rs
file.
|
|
lib.rs
file.Now to have access to this code, you have to create a lib.rs
file and use the mod
statement to import functions.
|
|
pub
(public), allowing another source file to use it.You should have the following arborescence:
|
|
Finally, on your main script, you can import the print_name()
function and use it on your code.
|
|
You can now compile and run.
|
|
Rust Ownerships & Borrowing
Ownerships are one of the main concept that is different from other languages. In Rust, when defining a variable, only that variable has the ownership. If the value is moved (assigned to another variable), the value is no longer usable on the program.
Consider the following program.
|
|
Here, we declare a variable value1
that take a String, the variable is then assigned to another value value2
. If we try to use the moved value1
we will have the following error:
|
|
The variable value1
contains the heap address that point on the value Coucou
. This can be explain with the schema below.

Then, the value is assigned to _value2
, it becomes the new owner and value1
can no longer reference the value on the heap.
When a value is passed as an argument to a function or returned from a function, it is also moved, and the original variable can no longer be used. This mecanism guarantee memory safety without requiring a garbage collector or manual memory management.
For our example, one solution is to borrow the value by adding the &
character.
|
|
Borrowing allow another variable to create temporary references to a value without transferring ownership. We can confirm that it works by running the program.
|
|
Error handling in Rust
Rust’s error handling is implemented using the Result
and Option
types, which are both enums that can have two possible variants:
Ok()
orErr()
for Result ;Some()
orNone
for Option.
The Result type is used to represent computations that may fail. It has two variants:
Ok(T)
, which holds a value of type T representing a successful computation ;Err(E)
, which holds a value of type E representing an error.
For example, we define a read_file
function that takes a file path as input and attempts to read and print the content of the file.
|
|
Here’s how the error handling works here:
- The
File::open
method returns aResult
object that can have anOk(file)
variant if the file is successfully opened, or anErr(error)
variant if the file cannot be opened.
?
operator to propagate any error that may occur to the caller of the function.- If the file is successfully opened, we create a
BufReader
object that allow us to efficiently read the file. - We iterate over each line using a
for
loop and thereader.lines()
method that produces aResult<String, io::Error>
object for each line. Note again the use of?
to propagate any error that may occur. - If there are no errors, we print the contents of each line to the console.
- Finally, we return an
Ok(())
value to indicate that the function completed successfully.
Then, we can use pattern matching with match
to handle the result.
|
|
This allow, by running the application to not panic on error:
|
|
Without this we would have had a panic which stops the execution of the program.
|
|
It would work the same using Option
type, except that we would had to use Some()
and None
enums.
Rust release mode
At that time, the compilation build an unoptimised and debug version. Rust compilator as the ability to produce a fully optimised binary with the use of the --release
switch. The option significantly improve the performance of the application. This includes:
- dead code elimination ;
- inlining ;
- loop unrolling.
This can highly increase the compilation time and memory usage but produce an optimized version of the application.
On bigger project, there is also the LTO (Link-Time Optimization) setting. By adding this on your Cargo.toml
file, you can further improve performance. This allow to perform optimization across the entire program, rather than just optimizing individual compilation units (source files) independently.
|
|
We will compare compilation time and performance later on the article. Now that you know all that things, let’s start developing our x509 certificate parser !
🛠️ Developing a Rust x509 Certificates Parser
During the development, I will mainly used the x509 parser crate from the Rusticata Project:
Needs & limitations
Input data format
When requestings CT logs, we get 2 fields: leaf_input
& extra_data
:
- leaf_input refers to the raw, binary-encoded X.509 certificate that is being logged ;
- extra_data is an optional field that can be included in a log entry to provide additional information about the certificate like metadata.
|
|
In addition to these fields, my collector add index number and the leaf hash of each certificate.
So our input data will have the following structure.
|
|
Data are collected into multiple .csv
file.
Output data format
The parser needs to extract all metadatas from each certificates. And output must be sorted by Pre-certificate | Certificate | CA certificate.
Indeed, a CT record contains a certificate chain which may consist of a pre-certificate or a certificate and one or more CA certificate (Certificate authority).
So at the end we want 3 output files:
precert.scsvh
: file that contains all pre-certificates ;cert.scsvh
: file that contains all end certificates ;ca.scsvh
: file that contains all CA certificates.
All records from these files will be processed by another technology, here Apache Spark - Scala, which I could present in another article.
Certificate Data Structures
The first thing that we want to do is to decide which field we want to extract from certificates. With the help of the x509 Certificate RFC, I choose to extract all these metadata:
- serial
- fingerprint_sha256
- fingerprint_sha1
- issuer
- subject
- country
- common_name
- locality
- organization
- organizational_unit
- state_province_name
- extensions
- subject_key_id
- authority_key_id
- basic_constraints
- alternative_name
- not_before
- not_after
To this I add 3 more fields:
- index: index number of the record in the CT log ;
- log: name of the CT log ;
- raw: base64 encoded raw certificate.
Creating data structures
Now that we have our fields, we can start creating structures. On our lib.rs
file, we create 3 structures:
- Metadata: contains all fields ;
- Subject: contains all subject fields ;
- Extension: contains all extensions.
|
|
Implementations
Now that we have our structures, we can implement the new()
methods for these to allow us to create instance of Metadata
for each record. This can be do by using the impl
keyword.
|
|
Here the new()
method takes 5 arguments, that are known before certificate metadata extraction. Other fields are initialized to empty String
instances using String::new()
. We can do that again for our both Subject
& Extensions
structure.
You can check that it work by trying to create an instance of Metadata
and try to print the structure.
|
|
This should provide the following result.
|
|
Note the use of #[derive(Debug)]
, a Rust attribute that automatically generates an implementation of the Debug trait for a given struct or enum. This allows us to print the Metadata
structure.
:?
the Debug formatter and the pretty-print #
flag.
Reading data from csv files
Now that we have our structures, we need to find a way to get all records of multiple csv files. To read csv files we can use the CSV Crates.
But first, as we have multiple CSV files, we need to read the content of the directory that contains all files. To do so, we can use the read_dir()
function from the Rust standard library.
|
|
Here, we define a prepare_parsing()
function that take in argument the directory path of our files. First, we check if the path given is a directory, it can be done with the is_dir()
method, if not throw an error.
Then, by using the read_dir()
function, we get an iterator over all directory entries. We collect all entries into a Vector.
filter()
& map()
we only get successfull entries.
|
|
We thus obtain a vector with all the files. Then, we create a read_entry()
that allows to read all record from a file.
|
|
We create a ReaderBuilder
that allow us to specify reader configuration, here we set header option at true as each files has a header. Then, we iterate over line and print each field of each line.
|
|
Parsing a x509 certificate chain
Now that we can read records, we can start our parsing by trying to parse the x509 chain of each record.
To do so, I create the function get_x509_chain()
that take two arguments, the name of the log and a StringRecord
which corresponds to a CSV line.
|
|
The function parse all fields and send leaf_input
and extra_data
to the get_leaf()
function.
|
|
Here, I use the crate ctclient from Tingmao Wang that contain the Leaf
struct with a method that parse a leaf from leaf_input
and extra_data
.
But here, trying to run the program we get the following error.
|
|
We can’t propagate error from Leaf::from_raw()
method because it use a ctclient::Error
which does not implement std::error::Error
. To resolve this problem we will implement our own error handling.
Create your own error handling system
To create your own error handling, we first define a new enum
. I will create this on the lib.rs
file.
I will take as example the first error that we handle. Do you remember the first check to see if the given path is a directory? Let’s implement our own error for that first.
We define a new ParserErrors
enum. This one contains a IsNotDir
element which expects an input string.
|
|
On our main function, at the top we declare a new type ParserResult
.
Then, we change the function return to ParserResult<()>
. On the else statement we can then return our custom error providing the directory path tested.
|
|
At this point the error handling is implemented for that error but we can’t actually print error because our enum does not implement the Display
trait. To do so, we can write the following code.
|
|
Now you can easily create new type of error by adding it on the enum. Now, on code, if you want to propagate error with the ?
statement you can use the map_err()
method.
|
|
Going back on our certificate chain parsing
Now that error can be handled we can parse the leaf using Leaf::from_raw()
and propagate the error.
|
|
Running that we can print each leaf.
|
|
The Leaf
struct defined on the ctclient crate is composed of multiple elements.
|
|
On that struct, we will use both x509_chain
& is_pre_cert
:
x509_chain
: contains the certification chain ;is_pre_cert
: a boolean.
The x509_chain
is a vector so we can iterate over certificates from the chain.
The library define that the first certificate of the chain is the end entity cert (or pre-cert, if is_pre_cert
is true), and the last is the root CA. Knowing that, we can, as wanted, separate each types of certificate (Cert/Precert/CA). But first we will try to extract all metadata from each certificate.
Parsing certificate metadata
Fingerprinting certificates
As our struct expect SHA256 & SHA1 fingerprints, we need first to get the raw certificate as a slice of bytes. To do so, we can use the as_slice()
function.
|
|
Here we iterate over the x509 chain and convert each certificate to a slice. Then, we can create SHAs object like so.
|
|
This produce an array of bytes that you can format using the formatter {:X}
. This will produce a hexadecimal string.
For greater readability, I create both SHAs functions that I put in an utils.rs
file.
Encoding raw certificate
As we want to save the raw certificate we can encode these into base64 using the base64 library.
|
|
Creating instance of Metadata
Now, we can create instance of Metadata, the structure that we create earlier as we have all the input arguments.
|
|
Parsing Metadatas from certificates
At that time, we have the following code.
|
|
We can start x509 certificates parsing. To do so, I will use the parse_x509_certificate
function that take a slice of bytes in input and return a result X509Certificate
struct that contain a TbsCertificate
object that contain all the metadata of a certificate as defining in the RFC-5280.
|
|
The function parse_x509_certificate
return a Result
so we need to handle the result and get only Ok()
result like so:
|
|
Then, we create the function parse_metadata()
that take in input our Metadata
struct and a X509Certificate
generated by parse_x509_certificate
.
On this function we define two variable subject_meta
& x509_ext
that take the return of two new functions:
get_subject_meta()
: parse subject fields, return aSubject
struct ;get_extensions()
: parse extensions, return aExtensions
struct.
|
|
Parsing Subject metadata
The function take as input a X509Name
object. This object has an iter_attributes()
method that allow us to iterate over each attribute. We then filter attributes by types, then we can fill in the elements found in the Subject
structure. We then obtain the following function:
|
|
Here, I used the unwrap_or()
that return the Some()
value if so or a default value (here an empty string).
Finally, the Subject
struct is returned.
Parsing Extensions metadata
The get_extensions()
function take as input the TBS Certificate. This, has a extensions()
method that can be used to iterate over each extensions using into_iter()
.
|
|
Authority and Subject Key Identifier are formatted into a hexadecimal format. By default, the structure give both [u8]
array. I used .map(|b| format!("{:02X}", b))
to iterate and format each element of the array.
Basic Constraints are simply formatted into a String.
Then, Alternative Name are parsed using the get_alt_name()
function that take as input a vector of GeneralName
. The GeneralName
is defined as follows.
|
|
I made the choice to take only DNSName
, IPAddress
and RFC822Name
(Email addresses). It gives the following code:
|
|
We declare a new vector alt_name
. Then, we iterate over each GeneralName
and push values into the vector. If the value is an IP address, I made a little format_ip()
function to format it.
|
|
As the library gives IP addresses in the form of an array of bytes ([u8]
), we can format these using IpAddr
& Ipv6Addr
from the standard library. If the length of the array is 4, it means that it is an Ipv4 so we use IpAddr
and if the length is 16 Ipv6Addr
.
I put the function in the utils.rs
file.
Complete our Metadata structure
Now that we parse all metadata, we can enter these on our Metadata structure define earlier.
|
|
I create another utils function format_date()
. The function take as input an i64
type (corresponding to timestamp) and format it into this format: %Y-%m-%dT%H:%M:%SZ
.
|
|
The parsing is over, you can now print each certificate. It should give the following result:
|
|
Writing results to file
Creating output files
Before writing result, we need to create 3 output path, one for each certificate type. I create the following function.
|
|
This function simply use the standard library to write a file for each type of certificate.
Creating an iterator over Metadata
In order to write each element of our Metadata
struct we need to implement the IntoIterator
trait for our struct. This will allow us to iterate over each element of the struct. To do so, on lib.rs
, I write the following implementation.
|
|
First, we create a MetadataIter
struct that will be returned by into_iter()
method. Then, we implement Iterator
trait for the new struct MetadataIter
.
After, we define the next()
method that is required by the Iterator
trait. On the definition, we list all our fields in any order you like.
Finally, we implement the IntoIterator
trait for Metadata
that will return a MetadataIter
iterator. This allow you to iterate over our 3 structs (Metadata
, Subject
and Extensions
).
Writing metadata
Then, on the parse_x509_chain
function, I write the following code.
|
|
This allow to write certificate metadata in the correct file. As defined earlier, the first cert is the end entity cert (or pre cert, if is_pre_cert
is true) and other are CAs.
The write_cert_metadata()
function match each type of certificate to create the output file path. Then, we call OpenOptions
to be able to append each record on files without deleting their contents.
Next, we instanciate a new WriterBuilder
providing our writer call from OpenOptions
.
Finally, we write the record on the file.
|
|
As we implement into_iter()
for Metadata
we can use it on the write_record()
method from the CSV crate that take an iterator in input.
Running the application to see if it works.
|
|
The application successfully parse 10k certificate chains and a total of 30,700 certificates.
Adding an arguments parser
Now, I will add an arguments parser to be able to call the application and provide different I/O paths. There is crate that can be used to build an argument parser. This crate is call clap.
You can instanciate a parser using the Command::new()
method and provide any usefull informations like authors, version and description of the application.
|
|
You can specify, for each arguments, the number of element it wait, specify if the argument is required or not and more. You can get your arguments using args.get_one::<type>("<argument_name>")
.
Then, when launching the application, you can print help using the --help
switch.
|
|
Performance analysis
At that time, the execution time of the application, for an input directory of 5.4G (970,990 certificate chain), is:
- 229.76s: with release mode
- 224.31s: with release mode and LTO
It produce output files with more than 2,992,000 parsed certificates.
To more deeply analyse performance of the application, we will use the flamegraph visualization, a technique developed by Brendan Gregg, a Cloud computing performance engineer.
What is a flamegraph ?
A flamegraph is a type of visualization used to analyze and understand the performance characteristics of software systems. It provides a way to visualize the stack trace of a program, highlighting the most frequently executed functions and their relationship to each other.
A flamegraph is typically drawn as a horizontal bar chart, with each bar representing a function in the call stack. The width of each bar represents the amount of time spent in that function, while the position of the bar indicates its place in the call stack. The bars are ordered in descending order of their contribution to the overall execution time, with the most frequently executed functions at the top.
Generate a flamegraph
To generate my flamegraph I will use a Rust port of the Flamegraph Project.
First, build the inferno Rust project. It will generate binaries on the target/release
directory.
|
|
When doing performance tests, be sure to drop your cache or tests may be inexact.
You can do it with this command echo 3 > /proc/sys/vm/drop_caches
We will use these two binaries: inferno-collapse-perf
& inferno-flamegraph
.
Now, we have to capture stack samples that will be used to create the flamegraph. Samples are used to determine what functions are being executed and how much time is being spent in each of them. To do so, we can use the linux perf
command. Then, using inferno
we will be able to create the graph from captured samples.
|
|
These commands generate the flamegraph.svg
file.
How to read a flamegraph ?
A flamegraph look like this. It is cool no ? :p

It allow us to see which function take the most of the execution time. Here we clearly see that our I/O functions that take the most of the execution time.
Our Reader takes 33% of the execution time while the Writer takes 21%.

We also see that fingerprint calculation takes almost 15% of the execution time.
The end…
I tried to explain Rust’s concepts that I learn and demonstrate these with the x509 Certificate parser.
Rust’s focus on memory safety and thread safety makes it an ideal choice for building high-performance, concurrent applications.
In our next article, we’ll explore Rust’s support for concurrency and parallelism, and show how we can use these features to further improve the performance of our x509 certificate parser.
📖 Bibliography
- Certificate transparency: https://certificate.transparency.dev/
- Maowtm CT blog: https://blog.maowtm.org/ct/en.html
- Rust installation: https://www.rust-lang.org/tools/install
- Rust docs: https://doc.rust-lang.org/stable/
- x509_parser docs: https://docs.rs/x509-parser/latest/x509_parser/
- ctclient docs: https://docs.rs/ctclient/0.4.5/ctclient/
- Flamegraph project:
- Official article: https://www.brendangregg.com/flamegraphs.html
- Official Github repository: https://github.com/brendangregg/FlameGraph
- Rust port of the project: https://github.com/jonhoo/inferno
Other cool projects
- Requestable certificate database: https://crt.sh/
- Real-time certificate parser: https://certstream.calidog.io/