Releases: chklovski/CheckM2
1.1.0
1.1.0 (2025-02-20)
This release uses models built on [GTDB release 202].
This release uses the NEW DIAMOND database version 3, available at DOI [10.5281/zenodo.14897628].
While the models and database are based on the same GTDB release as v 1.0.0/1.0.1/1.0.2, output results MAY DIFFER SLIGHTLY.
This is because the DIAMOND reference database was ported to a much newer version, leading to slightly different behaviour during protein blasting and subsequent KEGG annotation - therefore also completeness/contamination predictions; furthermore, models and a MinMax scaler were also ported to newer pre-requisites, which may also lead to slight changes in feature vector scaling and ML predictions.
The changes should be very minor (ie. less than 1% completeness/contamination difference in prediction for the same MAG from v1.0.0, if at all)
This is a release that IMPACTS previous CheckM2 functionality. Specifically, the pre-requisites for CheckM2, and some of its codebase, have been updated to much more recent versions to enable substantially easier installation, and quashing of mounting bugs due to deprecation. The database, the scaler, and the models have been ported to new versions tensorflow/scikit-learn/numpy/scipy/lightgbm/diamond (see below for detailed list).
The models and database of this release are NOT backwards-compatible to any CheckM2 version below 1.1.0. You will also need to create a new environment for CheckM2 v1.1.0 with the new pre-requisites. Furthermore, a number of bug fixes and quality of life changes were added to make deploying CheckM2 easier. Finally, updating to a new database+DIAMOND version with this release should fix an issue where CheckM2 output changed slightly for the same MAG depending on what other MAGs were included in the input, due to inconsistent DIAMOND annotation.
The new changes will also enable easier roll-out of updated models in the future.
Specific release notes:
- Pre-requisites updated to allow mamba installation without issues by requiring much newer Numpy/Scipy/Pandas/Tensorflow/Diamond/Lightgbm versions; parts of checkm2 codebase were adjusted to deal with errors due to deprecated behaviour, particularly pandas, numpy, scipy, and tensorflow; models and other pickled files ported to new requirements (addresses #117, #125, #123, #115, #125, #110, #105, #100, #49, #82, #78, #76, #70, #69)
- A new DIAMOND database was created with a newer version of diamond (2.1.11) in order to solve an issue where inconsistent DIAMOND annotation led to differences in scores for the same MAG if variable number of MAGs were provided as input (addresses #103); though please note that the scores from v1.1.0 will differ slightly from previous versions due to the DIAMOND update and other changes; they should however be stable within the version
- Python requirements changed to a minimum of 3.12; higher should also be fine (addresses #125, #97)
- README updated to provide expected CheckM2 output for testrun for the current version as previous README was unclear and confusing (addresses #120)
- The neural network model has been ported from .hd5 to the .keras model to ensure forward-compatibility and installation of much newer tensorflow version pre-requisites; required tensorflow version upgraded from 2.4 to 2.17 (addresses #119, #65)
- A new --no_write_json_db flag was added to prevent CheckM2 from updating the internal JSON file in its install directory with the location of the CheckM2 database, in case this behaviour is unwanted. If added, you will need to provide the path to the database to CheckM2 during its 'predict' run (addresses #126, #39, #73). This allows using 'checkm2 database --download --no_write_json_db' to download the database but NOT write its path to the internal JSON file)
- Lowered the chunking of DIAMOND annotation to 250 from 500 to lower memory use
- CheckM2 will check that the temp directory has at least 500 MB free space before proceeding with each DIAMOND annotation call chunk within a run; this is to prevent weird behaviour when temp space is full but DIAMOND finishes without throwing error, leading to some empty annotation files that then get concatenated with non-empty annotation output generated before temp space got full
- CheckM2 no longer generates a fancy [ie. remarkably pointless] header. Note that this will change checkm2 -h output to remove several lines of printing
- Main execution code moved from /bin/checkm2 to /checkm2/main.py for easier entry point installation and avoiding deprecation. Legacy behaviour is maintained as /bin/checkm2 imports the functionality from main.py - functionally there should be no differences
- New DIAMOND v3 database (contents identical, just different DIAMOND version) has been uploaded to Zenodo, accessible here: [10.5281/zenodo.14897628]
- Docker file updated - uses mamba to install prerequisites; removed older non-functional docker files
1.0.2
1.0.2 (2023-05-19)
This release uses models built on [GTDB release 202].
This release uses the DIAMOND database version 2, available at DOI [10.5281/zenodo.5571251].
Models and database are unchanged from v 1.0.1, output results should be the same.
This is a minor release that does not impact any previous CheckM2 functionality. New features in this release:
- Fixed some warnings when no protein file was generated
- CheckM2 now reports the total number of contigs as well as max contig length
- Better command-line suggestions for 'predict' function
- Fixes to setup.py reduce potential conda environment conflicts
- Updated README to include mamba installation
- Minor grammatical fixes to command-line usage
Next major CheckM2 release will incorporate updated models that were trained on high-quality genomes in GTDB release 214
1.0.1
1.0.1 (2023-01-22)
This release uses models built on [GTDB release 202].
This release uses the DIAMOND database version 2, available at DOI [10.5281/zenodo.5571251].
Models and database are unchanged from v 1.0.0, output results should be the same.
New features in this release:
- You can specify an alternate temporary directory for CheckM2 using
--tmpdirin case the default is limited in space (addresses #35) - Added a
--database_pathoption tocheckm2 predict. If you have a downloaded CheckM2 database but cannot set it viacheckm2 database --setdblocation, you can provide the path tocheckm2 predictinstead. (addresses #15) - Logging now notes CheckM2 version
- CheckM2 Version 1.0.1 onwards is now available on bioconda and pypi.
New bugfixes in this release:
1.0.0
1.0.0 (2022-12-16)
First official release of CheckM2!
This release uses models built on [GTDB release 202].
This release uses the DIAMOND database version 2, available at DOI [10.5281/zenodo.5571251].
Models and database are unchanged from v 0.1.3, output results should be the same.
New features in this release:
- CheckM2 now has tagged releases and a changelog summary. This addresses (#25, #22), as well as allows submission to PyPI and Bioconda (#29, #7)
- CheckM2 now has logging enabled by default. Logs will be saved in the output folder in the file 'checkm2.log' (Resolves #2)
- You can now optionally remove intermediate files (protein files and diamond output) using the
--remove_intermediatesoption (Resolves #3) - CheckM2 now checks for diamond database and loads machine learning models before proceeding with main workflow (Resolves #4)
- CheckM2 now reports coding density for bins, as well as contig N50, average gene size, genome length and GC content. This gives the user more information and can help identify issues such as e.g. frameshift-dominated genomes
- Processing feature vectors and predicting completeness and contamination is now chunked by groups of genomes (default 250) instead of holding all feature vectors in memory. This drastically reduces RAM usage by CheckM2.
- You can now specify a specific coding table that Prodigal should use for your bins using the
--ttableflag. By default, CheckM2 chooses between 4 or 11 based on coding density information. - CheckM2 now forces tensorflow models to run using CPU (this should address #26, #12). For better compatibility, it is strongly suggested to initially install the CheckM2 conda environment on a computer without a GPU
- CheckM2 should now use tensorflow release < 2.6.0 (this should address #16)
- CheckM2 can reuse prodigal and diamond output using the
--resumeflag (addresses #13, thanks to JeanMainguy for implementation)