Python Program to Read YAML

Overview
YAML is frequently used to store data in a serialized way. All programming languages can use YAML, it is a human-friendly data serialization language. Although its primary purpose is data exchange, it is most frequently used as a configuration file. We can use the Python code to read YAML files with the PyYAML module. YAML is an acronym for YAML Ain't Markup Language.
YAML Files in Python
Let's begin with what YAML stands for. Initially known as Yet Another Markup Language, it transpired to the developers of YAML that it wasn't a markup language as it only contains the tree structure of the elements inside it (a textual representation) and has no markup instructions like XML (say, if a text should be italic or red). Therefore, there came up the idea of a recursive acronym that YAML Ain't Markup Language.
To date, YAML is commonly used to store data in a serialized format. YAML is a human-friendly data serialization language that can be used with all programming languages. Primarily intended to be used for data exchange, it is also used as a configuration file more often. Not only scalar data (containing a single value) such as numbers, strings, etc., but also the compound data like lists, and dictionaries can be inserted in a YAML file.
To mention some of the prominent features:
- YAML is human-readable.
- Comments can be used in YAML files.
- Most importantly, we can store multiple documents in a single YAML file just with the use of a - - - separator. This feature makes it easier to define deployments and is widely used with Docker and Kubernetes.
Reading YAML Files in Python
Although there are several packages available in Python to parse YAML data, PyYAML is the most popular of all. We'll use the same to write our code in Python read the YAML file. However, it is not a part of the standard Python library, so we need to install this package. PyYAML lets you convert a YAML file into a Python dictionary.
Let's observe the subtle differences in JSON, XML, and YAML in the example given below.
JSON
XML
YAML
Prerequisites
Installing the PyYAML Module
There are two ways to install the PyYAML module on your system:
- Installing via pip command
- Installing via `source code
Of both these ways, it's easier to go ahead with the pip command.
Windows and Mac users can install PyYAML by using this command:
For some systems, pip3 might be needed instead of pip:
For Linux or Ubuntu, you can use the following commands:
Using PyCharm on a Windows machine, you can run the above-given command in the terminal inside PyCharm.
Create a YAML File
Let's create a YAML file to look at the tree structure of the data and we'll use this same file in the upcoming sections to read content from a YAML file.
Here's what our `users. YAML would look like this:
A YAML file begins with - - - and ends with an optional (...). It is necessary to provide the - - - separator when we have multiple documents in a single yaml file.
Reading YAML File after Converting a Python Object
To begin with, we need to import the YAML module in Python read YAML file data. Here we are going to convert a Python object into YAML data.
We'll be using the dump() function in our example. The yaml. dump() function creates YAML content by facilitating the serialization of the Python object. We have provided the content in the form of a Python dict object (dictionary) inside a Python list.
Moreover, by default, the dump() function also sorts the content output in the order of the keys provided in the direct object.
Output: The following output shows how the items of the dictionary inside the Python list are now created as a key-value pair and they are all sorted based on the order of their keys. Therefore, the key country comes before email and name while the key phone is at the last.
Reading YAML Content from YAML File
We had previously created a `users. YAML file that we use in this example as well.
Deserialization of YAML into a Python object The load() function will parse the YAML content and convert it to a Python direct object. If there are multiple YAML objects, the output will comprise a list of the dictionaries.
The load() function can accept a byte string, a Unicode string, a YAML object, and an open binary file object. The default encoding format for a file or byte-string code is utf-8 but it can be utf-16-be or utf-16-le as well.
Furthermore, there are four loaders available for the load() function.
- BaseLoader: It loads the most basic YAML scalars only in the form of Strings.
- SafeLoader: It is meant for safer loading as it loads a subset of the YAML. It is more often used for untrusted inputs.
- FullLoader: It will load the entire YAML except for the arbitrary code execution. However, it is still not recommended to use for any untrusted data.
- UnsafeLoader: It is used for backward compatibility and is also an original loader for untrusted inputs. But using FullLoader is still safer than using UnsafeLoader.
Note: Above all, it's always better to use SafeLoader along with the yaml. load() function when the data source is untrusted.
Output: The output demonstrates how each object from users.yaml file has been converted into a list of dictionaries. Following this conversion, we also reconverted the list of dictionaries into a YAML object so that the entire data gets sorted based on the order of the keys.
The dump() function also includes a sort_key parameter which is always set to true by default.
Reading Values and Keys From YAML File
The above example produces the final YAML content in the sorted order of the keys. However, in order to get the keys and values separately from the load_data variable, we can make use of the nested for loop to iterate the entire YAML content and print the key-value pairs in our own desired format.
And since we are not using the dump() function in this case, the output will not be sorted in the order of the keys.
Output:
Reading YAML Content into a List of Dictionaries
Now, let us look at a simple example to create YAML content in a Python list of dictionaries. Here, we are using SafeLoader with the yaml. load() function as it can be used to read data from the untrusted input as well.
Output:
Conclusion
- YAML is a human-friendly data serialization language that can be used with all programming languages. Its primary purpose was data sharing, but it is more frequently used as a configuration file.
- YAML is an acronym for YAML Ain't Markup Language.
- YAML is human-readable.
- YAML files support the usage of comments.
- Using the - - - separator, we can store several documents in a single YAML file.
- PyYAML lets you convert a YAML file into a Python dictionary. It is not a part of the standard Python library, so we need to install this package. We can use PyYAML to write programs in Python and read YAML files.
- To install PyYAML, the following command can be used: