IonVision Data Structure
And Visualization in R

This blog post has been modified from an R Notebook. The notebook and the used data can be found from our GitHub, which is linked at the bottom of this post!

Loading sample data

Let’s begin by looking at the data structure we have in a typical dataset. This example set consists of seven measurements. Two of ethanol, two of 2-propanol, and three background measurements taken before sampling, after ethanol, and after 2-propanol. The table below presents a printout of the first columns of the dataset, containing measurement comments, environmental parameters etc. The loading functions used always ensure that the data is ordered by start time, making the oldest result the first one in the table.

Comment

Sample humidity
(% RH)

Sensor flow
 (LPM)

Ambient temperature
 (°C)

Start time

First datapoint intensity (pA)

background before sampling

2.26

4.81

28.72

2024-02-01 13:26:28

-3.571

ethanol

2.02

4.85

28.76

2024-02-01 13:28:05

-3.636

ethanol 2

1.95

4.86

28.79

2024-02-01 13:29:34

-3.717

Background after ethanol

1.76

4.87

28.85

2024-02-01 13:31:33

-3.506

2-propanol

1.32

4.88

28.93

2024-02-01 13:35:24

-3.717

2-propanol 2

1.29

4.84

28.98

2024-02-01 13:36:55

-3.655

Background after 2-propanol

1.25

4.8

29

2024-02-01 13:38:38

-3.636

As seen in the table, we load some select environmental parameters, along with the comment, start time, and all data from the measurement files. Flows are measured from the sample (the input of the device) and the sensor (the flow through the DMS sensor, which includes the sample flow and the clean circulating flow inside the device). Temperature and humidity are measured from both of the flows and additionally from inside the device to give a reference point to the ambient environment. Maximum, minimum and average values are collected from all these parameters.

The environmental parameters might be of special interest in long measurement series, or when the repeatability of the system or a measurement is tested. DMS is very dependent on the absolute humidity of the sample. Lowering the humidity increases the sensitivity of the device. This means that changing environmental conditions may present challenges when sampling is done from ambient conditions. The system also allows the user to set upper and lower limits for these parameters, and any ”errors” from these limits are recorded in the json file as well. This provides a consistent way to exclude measurements where certain conditions are not met.

Comments and annotation

When used in a consistent manner, the comments can be used to annotate the measurements for data analysis. Another possibility is to use external annotations from a .csv file. Let’s say we needed the comments to be identical for repeat measurements of background, ethanol, and 2-propanol. As seen in the first table, we have some minor inconsistencies in the commenting. With such a small set, we could edit the comments manually in R or from the .json files, or employ the external annotation.

For example’s sake, let’s load the external annotations found in the sample data folder. The below table shows a comparison between the original comments and the example annotation file.

Comment

Annotation

background before sampling

background

ethanol

ethanol

ethanol 2

ethanol

Background after ethanol

background

2-propanol

2-propanol

2-propanol 2

2-propanol

Background after 2-propanol

background

In cases where several samples need to be averaged (like in the below section), consistent comments make looping through the data easier. It is also possible to use, for example, sample IDs as the IonVision comments depending on your use case.

Data visualization

Now that we have the data loaded and properly annotated, we can plot the average responses for our three classes. IonVision data consists of two spectra, one from the positive ion side and one from the negative ion side. Number of data points of both sides is determined by the Usv and Ucv vectors. Usv is the steps taken by the separation voltage, and Ucv by the compensation voltage. Typically, the Usv is plotted as the y-axis and Ucv as the x-axis. The color scale is the signal intensity in pA. The negative ions are recorded as a negative pA reading, but for these images the absolute values are shown to keep the color scale consistent.

When looking at the images above, we can see that there are clear differences between the samples on the positive side, but the negative side seems unaffected by the substances. The side on which a substance is seen depends on the type of ions it produces in the ionization. For example simple carboxylic acids are typically seen on the negative side.

So what should a spectrum look like?

In an ideal scenario the background sample taken from ambient air would only consist of the reactant ion peak (RIP), which can be seen in the above images as the lowest peak leaning towards right (red in the positive background image). Since our measurements were performed in a laboratory environment, several weak peaks are seen even in the background measurements, resulting from ambient VOCs and possibly from system contamination from previous measurements.

The RIP is the result of water. Water molecules enable the creation of the other ion clusters when sampling in ambient air. This peak disappears in the ethanol and 2-propanol positive side spectra, implying that all available water has been used in the reactions. This means that the concentration of the samples has been quite high. Ideally, the RIP should stay visible in measurements where the sample concentration is being studied. When only the compound recognition is concerned, the RIP is not quite as important.

We hope that this post has been informative! For a hands-on look at the data structure, follow this link to our GitHub. There you can download the R notebook and the sample data, and try running the code yourself!