Supported File Formats
AfriData Commons supports a wide range of file formats to ensure maximum accessibility and interoperability. Here are our supported formats and recommendations:
| Format | Extension | Support Level | Use Case | Max Size |
|---|---|---|---|---|
| CSV | .csv | Recommended | Structured tabular data | 100MB |
| JSON | .json | Recommended | Structured data, APIs | 50MB |
| Excel | .xlsx, .xls | Supported | Spreadsheet data with multiple sheets | 25MB |
| XML | .xml | Supported | Hierarchical structured data | 25MB |
| Parquet | .parquet | Beta | Large-scale analytics | 500MB |
| GeoJSON | .geojson | Recommended | Geographic data | 50MB |
| Shapefile | .shp + .shx + .dbf | Supported | GIS vector data | 100MB |
| Proprietary | .sav, .dta, .mat | Convert Required | Statistical software formats | N/A |
Metadata Standards
Comprehensive metadata is crucial for dataset discovery and proper usage. Our metadata schema follows international standards while accommodating African-specific context.
Basic Information
- Dataset title and description
- Creator/author information
- Creation and modification dates
- Version information
- License and usage rights
Geographic Context
- Country/region coverage
- Coordinate system (if applicable)
- Administrative boundaries
- Urban/rural classification
- Language(s) used
Temporal Coverage
- Data collection period
- Temporal resolution
- Update frequency
- Historical context
- Seasonal considerations
Methodology
- Data collection methods
- Sampling techniques
- Quality control measures
- Processing steps
- Limitations and biases
Example Metadata Schema (JSON)
Data Quality Standards
High-quality data is essential for reliable research and analysis. Our quality standards ensure datasets meet international best practices.
Completeness
Ensure your dataset is as complete as possible:
- Missing values < 5% per column
- Clear indication of null/missing data
- Explanation for missing values
- Complete geographic coverage
Accuracy
Verify data accuracy through validation:
- Cross-validation with external sources
- Outlier detection and handling
- Unit consistency checks
- Temporal consistency validation
Timeliness
Ensure data is current and relevant:
- Regular update schedule
- Clear versioning system
- Timestamp for data collection
- Deprecation notices for old data
Consistency
Maintain consistent data formats:
- Standardized naming conventions
- Uniform data types
- Consistent units of measurement
- Harmonized categorical values
Documentation Standards
Comprehensive documentation ensures your dataset can be understood and used effectively by other researchers and practitioners.
Data Dictionary
- Column/variable descriptions
- Data types and formats
- Valid ranges and constraints
- Relationships between variables
- Coding schemes for categorical data
Methodology Document
- Research objectives and questions
- Sampling methodology
- Data collection procedures
- Quality control measures
- Known limitations and biases
Technical Documentation
- Processing scripts and code
- Software versions and dependencies
- Hardware specifications
- Computational environment details
- Reproducibility instructions
Usage Guidelines
- Intended use cases
- Appropriate analysis methods
- Citation requirements
- Contact information for questions
- Update and maintenance schedule
Example Data Dictionary Entry
Ethics and Privacy Standards
Ethical data sharing is fundamental to AfriData Commons. All datasets must comply with ethical guidelines and privacy regulations.
Privacy Protection
Personal and sensitive data must be properly anonymized or aggregated. Direct identifiers should be removed, and indirect identifiers should be assessed for re-identification risks.
Informed Consent
Data subjects must have provided informed consent for data collection and sharing. If consent was not explicitly obtained for public sharing, data must be sufficiently anonymized.
Fairness and Non-discrimination
Datasets should not perpetuate or amplify existing biases. Consider the potential for discriminatory use and provide appropriate warnings or safeguards.
Cultural Sensitivity
Respect cultural contexts and sensitivities. Engage with local communities and stakeholders when appropriate, especially for data about indigenous or marginalized populations.
Anonymization Requirements
- Remove direct identifiers (names, IDs, addresses)
- Assess quasi-identifiers (age, location, profession)
- Apply k-anonymity (k≥5) for sensitive data
- Use differential privacy for high-risk datasets
- Document anonymization methods
Ethical Review
- Institutional Review Board (IRB) approval
- Ethics committee review documentation
- Data sharing agreements
- Consent forms and protocols
- Risk assessment documentation
Submission Process
Follow these steps to ensure your dataset meets our standards and is successfully published on AfriData Commons.
Prepare Your Dataset
Ensure your data is in a supported format, properly structured, and cleaned. Remove any sensitive information and create comprehensive documentation.
Complete Metadata
Fill out all required metadata fields using our online form. Provide detailed descriptions, geographic coverage, and methodology information.
Upload Files
Upload your dataset files, documentation, and any supplementary materials. Ensure file sizes are within limits and formats are supported.
Quality Check
Our automated system will run quality checks on your dataset. Review any flagged issues and make necessary corrections.
Peer Review
Your dataset will undergo peer review by domain experts. This typically takes 2-4 weeks depending on complexity and reviewer availability.
Publication
Once approved, your dataset will be published with a DOI and made available to the research community. You'll receive a notification with the publication details.
Additional Resources
Explore these resources to learn more about data standards and best practices for African research contexts.