Pfizer/BioNTech C4591001 Trial - Downloading & Exploiting the XPT Files
A few R scripts which may be of use to independent researchers wishing to download & exploit the .XPT files themselves, or to people wishing to verify our data.
This article is boring and contains no cheese - we’re publishing it in order to ease the life of other researchers and document the method of incoming articles.
Introduction
A few have objected that Perl, (although a fantastic language), wasn’t the most commonly practiced, which wasn’t easing the verification of our arguments by third parties. While in the process of therefore converting the existing major findings’ code to R, here are a few tips which could ease access to the PHMPT files, for independent researchers.
The following short article will detail how C4591001 “XPT” (SAS1 proprietary format used during the trial) files, containing the trial “raw data”, are automatically downloaded, and converted to .CSV, on Windows 10, with a fresh installation of R2, and R-studio3.
PHMPT’s .XPT files are released “on the flow”, when the FDA cares to forward them, and that they have first been analyzed by ICAN’s team. The “download.R” script therefore verifies if the local files have already been downloaded, and downloads files which haven’t been downloaded yet.
A second script, “convert_to_csv.R”, will convert all of these files to .SSV (semicolon separated values), which anyone can then open using Excel (or OpenOffice4), or easily post-treat in the coding language of his choice.
If you’re encountering system-related issues which aren’t addressed here, and aren’t running on an Apple device (if you do, we won’t be able to help much), don’t hesitate to ask in the comments - if you’re wondering about something it’s likely that someone else will and that this article can be improved.
Sections in italic below are indicating code command to run in R-studio.
Remember, shall you wish to analyse this data, that the ADRG5 (Analyst Data Reviewer’s Guide) contains information required to handle these files.
I - Setting up your environment
You only need to do the following once (per workstation).
Download or clone the R project, available on GitHub.
Open “Pfizer Documents.Rproj” with R Studio (a simple double click on it should suffice).
An interface will open on screen, divided in 4 sections :
A. Current script (in this example “download.R”. This is where you select the code to run when using one of the scripts provided.
B. Results. This is where R will render results you care about (which shouldn’t concern you too much on this article).
C. Files contained in the project.
D. Console (this is where you execute instructions, followed by confirming with ‘Enter’)
Install the Stringi6 package, by entering in the console :
install.packages("stringi")
If your internet is of poor quality, causing a timeout error, run first
options(timeout = 600)
where “600” is your timeout, in seconds).
Install the Haven7 package, by entering in the console :
install.packages("haven")
II - Downloading or Updating your local PHMPT XPT files
You’ll need to run the following scripts on each PHMPT update containing new .XPT files (or to simply run “update_phmpt_xpt_files.R”, remembering to adjust line 8 if you run the SSV instead of the CSV export, or removing line 8 if you wish to stop at the XPT step).
In R studio, open “download.R” by clicking on it (single click is sufficient) in the bottom right section.
2. Run the script, by selecting the code (CTRL + A after clicking in the top-left section) and executing the selected code with CTRL+Enter (or click “Run” in the top-right area of the top-left section).
The console will render the commands executed as it downloads all the .XPT files you don’t have yet (it’ll take a while the first time). These files will be stored in a ‘zip_data’ sub-folder of the current project when they are compressed - and in a ‘xpt_data’ sub-folder when they aren’t.
If everything to this stage went fine, you’ll see that your “zip_data” folder (accessible by single click bottom right) is now filled with .XPT files (zipped).
You can come back to the upper level of a sub-folder view by clicking the two dots top of the view.
III - Extracting .ZIP Archives
In a similar fashion that you ran “download.R”, run “extract.R”, which will unzip all the XPT files provided on ZIP format to the ‘xpt_data’ sub-folder formerly created.
At this stage, your ‘xpt_data’ folder will contain all the .XPT files you need if you’re doing the rest of your analysis in a language which can handle XPT files (Python, R, etc). Otherwise, execute step IV.
IV - Converting all the XPT Files to CSV Files or SSV Files
Run the script “convert_to_csv.R” in order to inherit from a new sub-folder, ‘csv_data’, containing all the files converted to .CSV (“,” separated).
To obtain SSV (“;” separated) files, simply run the alternative “convert_to_ssv.R”.
Remember that the ADRG linked below contains useful data to understand the type of data each file contains.
Thanks! Wanted to mention additionally that JMP software (from SAS) can also open XPT files directly. Even more convenient is JMP Clinical, which can automatically read all of the CDISC files for a study and generate interactive reports with a few mouse clicks.
Happy to collaborate in analyzing these or any related data. I have JMP Clinical reports for C4591001.
👏👏👏👏👏👏👏👏🎩🎩🎩🎩🙏🙏🙏🙏