Admin 30 May 2026 06:50

 

File Formats Used in the Chemistry Survey 201718

The Chemistry Survey 201718 was a nationally coordinated effort to collect data on teaching practices, research output, student demographics, and laboratory resources across postsecondary chemistry departments in the United Kingdom. A key component of the project was ensuring that the data gathered could be stored, shared, and analysed efficiently. This required the careful selection of file formats that balance openness, longterm preservation, and ease of use.

Why File Format Choice Matters

Choosing the right format has an impact on three major areas:

  • Interoperability the ability for different software systems (e.g., statistical packages, spreadsheet programs, and archive repositories) to read the data without loss of information.
  • Longevity open, welldocumented formats are less likely to become obsolete, protecting the surveys legacy for future researchers.
  • Transparency reproducible research requires that raw data, analysis scripts, and results be openly available in formats that can be inspected and validated.

Core Data Sets and Their Formats

1. Questionnaire Responses

All responses were collected through an online questionnaire built with SurveyMonkey. Once the collection period closed, the raw data were exported in two complementary formats:

FormatExtensionRationale
CSV (CommaSeparated Values).csvPlaintext, easily imported into Excel, R, Python, and most statistical packages. Retains a simple tabular structure.
SPSS Portable File.porPreserves variable labels, value labels, and missingvalue definitions used during analysis in IBM SPSS Statistics.

2. Laboratory Inventory Lists

Departments submitted detailed inventories of equipment, chemicals, and safety equipment. The inventory files were standardised using the following formats:

  • OpenDocument Spreadsheet (ODS) the preferred format for data entry because it is fully open and supported by LibreOffice, Apache OpenOffice, and many other tools.
  • XML (eChem Lab Schema) a custom XML schema (eChemLab.xsd) that describes each inventory item with attributes such as , , , and . XML enables automated validation and integration with laboratory information management systems (LIMS).

3. Student Demographic Data

Aggregated demographic statistics (e.g., gender, ethnicity, firstgeneration status) were stored in two formats to satisfy both statistical analysis and publicrelease requirements:

  1. R Data Frame saved as .rds efficient binary storage for internal analysis.
  2. Statistical Data and Metadata Package (SDMP) a JSONbased container that couples the data with a machinereadable metadata document (metadata.json) following the Research Data Alliance (RDA) recommendations.

4. Qualitative Interview Transcripts

Approximately 120 semistructured interviews were conducted with faculty members. Transcripts were recorded in:

  • Plain Text UTF8 (.txt) ensures universal readability and simple version control.
  • TEI XML (.tei.xml) the Text Encoding Initiative format, which encodes speaker changes, timestamps, and linguistic annotations, facilitating sophisticated qualitative analysis with tools like Voyant Tools and MAXQDA.

5. Analytical Figures and Visualisations

All graphics produced for the final report were delivered in dual formats:

Graphic TypeVector FormatRaster Format
Bar, line, and scatter plots.svg.png (300dpi)
Geographical maps of participating institutions.pdf.jpg (highresolution)
Complex network diagrams.svg.tiff (lossless)

The vector files allow unlimited scaling for future publications, while the raster versions guarantee compatibility with legacy software that cannot read SVG or PDF.

Metadata and Documentation Standards

Every dataset was accompanied by a README file written in Markdown (.md) that described:

  • Data collection methodology
  • Variable definitions, units, and coding schemes
  • Software versions used for cleaning and analysis
  • Licensing (Creative Commons AttributionNonCommercial 4.0 International)

In addition, the Digital Curation Centre guidelines were followed, and a datapackage.json (following the Frictionless Data specification) was generated for each major collection, enabling automated discovery by data portals.

Preservation Strategy

All files were deposited in the institutional repository ResearchData@MyUniversity with the following preservation actions:

  1. Format validation each file was checked against its schema (e.g., ODS files validated with odfvalidator, XML files with xmllint).
  2. Checksum generation SHA256 checksums stored alongside each file to detect corruption over time.
  3. Migration plan a review every five years to ensure that formats remain supported; for example, a future migration from ODS to XLSX if the latter gains stronger archival backing.

Benefits Observed During the Survey

The deliberate selection of open, welldocumented formats yielded several tangible advantages:

  • Rapid data cleaning analysts could import CSV or ODS files directly into R and Python without needing conversion scripts.
  • Consistent reporting the same SVG graphics were reused in the webbased dashboard, the printed report, and an accompanying companion paper, guaranteeing visual consistency.
  • Reusability thirdparty researchers have already repurposed the XML laboratory inventories to benchmark equipment expenditures across UK universities.
  • Transparent peer review reviewers could download the raw TEIencoded interview transcripts and apply their own coding schemes, increasing confidence in the qualitative findings.

Challenges and Lessons Learned

Although the format strategy was largely successful, a few issues highlighted the need for ongoing vigilance:

  1. Version incompatibility some older versions of Microsoft Excel mishandled ODS files, requiring the provision of a fallback XLSX version.
  2. Large XML payloads the eChem Lab XML files grew to several hundred megabytes for some institutions; compressing them with .xml.gz and providing a schemadriven streaming parser mitigated performance problems.
  3. Software licensing constraints a small number of contributors only had access to proprietary statistical software (SPSS). The inclusion of a portable SPSS file (.por) helped but underscored the benefits of offering a pure opensource alternative (e.g., exported Stata .dta or R .rds files).

Future Directions

Building on the 201718 experience, the next iteration of the Chemistry Survey plans to adopt additional standards:

  • FAIR Data Packages using the FAIR Data Point model to expose metadata via a RESTful API.
  • Linked Data representing laboratory inventories as RDF triples, enabling integration with external knowledge graphs such as the Wikidata chemical entities dataset.
  • Containerisation of Analysis distributing analysis scripts in Docker images with all dependencies, paired with the same input files (CSV, RDS, JSON) to guarantee reproducibility.

Conclusion

The Chemistry Survey 201718 demonstrates that thoughtful fileformat selection is not a peripheral concern but a central pillar of highquality, reproducible research. By combining plaintext, openXML, and widely supported binary formats, the project achieved a balance between accessibility, durability, and analytical power. The documentation, metadata practices, and preservation workflow established during the survey provide a template that can be adapted by other disciplines seeking to manage complex, multimodal research data.

```

Reference Files For File Formats In Use At Chemistry Survey 2017-18
Screenshoot
File Name
1656100802_library_ch_cam_ac_uk___mem_fileformatsinuseatchemistry_v2_20190614_-_Standar_Format.xlsx

File Size MB

File Type
XLSX

File Site
Description
This file is just a reference file for File Formats In Use At Chemistry Survey 2017-18. Does not guarantee that the specific things you want are included in it.
Direct download (wait 10 seconds)

Pengkajian Pada Sistem Kardiovaskuler dan Link Download File Referensi

Sejarah Indonesia dan Link Download File Referensi

Kegiatan Literasi dan Link Download File Referensi

Sistem Pakar Dalam Bidang Psikologi dan Link Download File Referensi

SURAT KETERANGAN dan Link Download File Referensi