Research Compass: 7 Common Research Data Management Mistakes

Research data management (RDM) is an essential component of responsible, reproducible, and impactful science. With organizations like NSFC, MOST, NIH, and Horizon Europe enforcing strict or promoting data sharing policies and the emphasis on FAIR (Findable, Accessible, Interoperable, Reusable) data growing stronger, poor RDM practices can derail careers, waste resources, and undermine trust in research.

Despite its importance, many researchers, from PhD students to senior principal investigators, still fall into avoidable traps. In this blog, we will highlight seven of the most common research data management mistakes, drawn from real-world lab experiences, funder reports, and community discussions, along with practical steps to fix them.

18 Feb 2026

[3 min read]

The Seven Common Research Data Management Mistakes

No Data Management Plan (or Treating It as a Box-Ticking Exercise)
The Mistake: Researchers often write a Data Management Plan (DMP) only when required by a grant application, then ignore it for the rest of the project. Some may copy and paste generic text that doesn't reflect their actual workflow.

Why It Hurts: Without a living plan, data organization falls apart, compliance fails during audits, and reuse becomes impossible.

How to Avoid It:
- Treat the DMP as a living document—update it at major milestones.
- Use tools like DMPTool or institution-specific templates that align with funder requirements.
- Involve your whole team early: discuss roles, formats, storage, and sharing upfront.
Poor File Naming and Folder Organization
The Mistake: Files named data_final_v3_corrected.xlsx, analysis_new.R, or scattered across desktops, USB drives, and email attachments.

Why It Hurts: You (or your future self) waste hours hunting for files. Collaboration becomes chaotic, and reproducibility suffers.

How to Avoid It:
- Adopt a consistent naming convention: e.g., YYYYMMDD_Project_SampleType_Version_Description.ext (like 20260212_MouseCohort1_RNAseq_v1_raw.fastq).
- Use a logical folder structure: Project → Phase (Raw / Processed / Analysis) → Date or Sample.
- Document the convention in a README file at the project root.
Neglecting Backups and Ignoring the 3-2-1 Rule
The Mistake: Relying solely on a laptop hard drive or institutional network drive without proper backups. "It won't happen to me" until a ransomware attack, spilled coffee, or hardware failure strikes.

Why It Hurts: Data loss is devastating—years of work gone, projects stalled, publications retracted.

How to Avoid It:
- Follow the 3-2-1 backup rule: 3 copies of data, on 2 different types of media, with 1 copy off-site (or in the cloud).
- Use automated tools: institutional storage + external drive + cloud (e.g., OneDrive, Google Drive, or research-focused options like Wasabi or AWS S3 with versioning).
- Test restores periodically—a backup you can't restore is useless.
Skipping or Doing Poor Metadata and Documentation
The Mistake: Assuming the data "speaks for itself" or adding metadata only at publication time (if at all).

Why It Hurts: Data without context is unusable. Others can't find, understand, or reuse it—violating FAIR principles (especially Findable and Reusable).

How to Avoid It:
- Document early and often: use README files, codebooks, or metadata templates (e.g., DataCite schema).
- Capture who, what, when, where, why, and how for every dataset.
- Adopt domain standards (e.g., MIAME for microarray, MIBBI for biology) and tools like CEDAR or LabArchives for structured metadata.
No Version Control for Data and Analysis
The Mistake: Overwriting files, using "final_v4_reallyfinal.docx" or no tracking of changes in scripts and processed data.

Why It Hurts: Impossible to reconstruct exactly what analysis produced which result—a reproducibility nightmare.

How to Avoid It:
- Use Git + Git LFS or DVC for datasets and code.
- For non-coders: tools like Synapse, or Dataverse support versioning natively.
- Log every major change in a changelog or notebook (Jupyter/ Quarto/ R Markdown).
Ignoring Security, Privacy, and Ethical Requirements
The Mistake: Storing sensitive human data on unsecured drives, sharing via Dropbox links, or failing to anonymize properly.

Why It Hurts: Breaches lead to legal issues (PDPO, GDPR, HIPAA), loss of trust, funding bans, or harm to participants.

How to Avoid It:
- Classify data early (public /internal /sensitive/ restricted).
- Use encryption, access controls, and secure platforms (secure institutional clouds).
- Follow ethical guidelines: obtain IRB/ethics approval, document consent for sharing, and use de-identification tools.
- Plan for controlled access repositories when full open sharing isn't possible.
Leaving Data Management Until the End (or Never)
The Mistake: Waiting until manuscript submission, thesis defense, or grant closeout to organize, document, and deposit data.

Why It Hurts: Hasty cleanup leads to errors, lost details, and missed opportunities for reuse. Many datasets never get shared.

How to Avoid It:
- Build RDM into the project workflow from day one.
- Schedule regular "data sprints" (e.g., monthly cleanups).
- Choose repositories early (Zenodo, Figshare, Dataverse, domain-specific ones) and deposit incrementally.
- Reward good practices: credit data curation in evaluations and CVs.

Final Thoughts

Good research data management isn't about perfection—it's about intentional, sustainable habits that protect your work and amplify its impact. In 2026, with AI tools increasingly consuming and generating data, FAIR-compliant, well-managed datasets are becoming even more valuable (and expected).

Start small: pick one or two mistakes from this list that resonate most with your current workflow, implement a fix this week, and build from there. Your future self—and the broader research community—will thank you.

Subscribe

Subscribe

7 Common Research Data Management Mistakes

Get new posts by email: