Genome and transcriptome sequencing characterises the gene space of Macadamia integrifolia (Proteaceae)

Chief Investigator

Nock, Catherine


Hort Innovation

Grant ID


Grant Links

Hort Innovation

School or Research Centre

Centre for Plant Science

Lead Partner Organisation

Southern Cross University


Catherine Nock,

Centre for Plant Sciences Southern Cross University,

PO Box 157 Lismore NSW, 2480 Australia.


Graham King,

Centre for Plant Sciences Southern Cross University,

PO Box 157 Lismore NSW, 2480 Australia.



Macadamia, Proteaceae, Genome, Gene space, Transcriptome


The large Gondwanan plant family Proteaceae is an early-diverging eudicot lineage renowned for its morphological, taxonomic and ecological diversity. Macadamia is the most economically important Proteaceae crop and represents an ancient rainforest-restricted lineage. The family is a focus for studies of adaptive radiation due to remarkable species diversification in Mediterranean-climate biodiversity hotspots, and numerous evolutionary transitions between biomes. Despite a long history of research, comparative analyses in the Proteaceae and macadamia breeding programs are restricted by a paucity of genetic information. To address this, we sequenced the genome and transcriptome of the widely grown Macadamia integrifolia cultivar 741.

Data Collection Start Date


Data Collection End Date



Catherine J Nock et al. collected the plant material and associated DNA and RNA sequence data from Macadamia integrifolia cultivar 741 used in this study.

A full description of the methodology is provided in Nock, C.J., et al. (2016) Genome and transcriptome sequencing confirms the gene space of Macadamia integrifolia (Proteaceae). BMC Genomics 17(1): 937 DOI 10.1186/s12864-016-3272-3,


Macadamia M2 RVT, Clunes NSW Australia

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

FoR Code


Data Processing

In total, over 95 gigabases of DNA and RNA-seq sequence data were de novo assembled and annotated. The draft assembly has a total length of 518 Mb and spans approximately 79% of the estimated genome size. Following annotation, 35,337 protein-coding genes were predicted of which over 90% were expressed in at least one of the leaf, shoot or flower tissues examined. A complete description of the methods is provided in Nock et al. (2016).