The coreGF score can be calculated in two steps. 1) BLAST search Perform a BLAST search of the annotated protein sequences (BLASTp) or transcript sequences (BLASTx) against the Plaza 2.5 proteome database. A preformatted BLAST database has been made available in the ftp coreGF folder (PLAZA_2.5_proteome). The output should be tab-delimited, with the target locus in the second column. All significant BLAST hits should be given as input to the python script. 2) Calculate coreGF score using python script (coreGF_plaza2.5_geneset.py) Three sets of coreGFs have been defined for different plant lineages (green plants, rosids and monocots) by means of Plaza 2.5. A table with the selected gene families and their representatives can be found in the supplementals of Van Bel et al., 2012 and are available as text files in the ftp coreGF folder (coreGF_plaza2.5_greenplants.txt, coreGF_plaza2.5_rosids.txt and coreGF_plaza2.5_monocots.txt). This python (v3) script reads the coreGF table and parses the tab delimited blastoutput. All BLAST hits are taken into account in search for a hit with the representative sequence of a coreGF. As each coreGF has a its own weight, the coreGF score is calculated as the sum of weights of the represented coreGFs divided by the total weigth of all coreGFs. The coreGF score, the number and list of missing coreGFs are written to STDOUT. Show help: python3 coreGF_plaza2.5_geneset.py -h Example: python3 coreGF_plaza2.5_geneset.py coreGF_plaza2.5_rosids.txt example_BLASTp_ath_plaza2.5_proteome.txt > example_coreGFscore_ath_rosids.txt