Title

Computational data for: Chromosome-scale assembly and annotation of the macadamia genome

Chief Investigator

Nock, Catherine

Funders

Hort Innovation

Grant ID

MC15008

Grant Links

Hort Innovation

School or Research Centre

Centre for Plant Science

Lead Partner Organisation

Southern Cross University

Contact

Catherine Nock,

Centre for Plant Sciences Southern Cross University,

PO Box 157 Lismore NSW, 2480 Australia.

catherine.nock@scu.edu.au

Ramil Mauleon

Centre for Plant Sciences Southern Cross University,

PO Box 157 Lismore NSW, 2480 Australia.

ramil.mauleon@scu.edu.au

Keywords

macadamia, Proteaceae, genome, genetic linkage map, pseudo-chromosome, transcriptome, nut crop

Description

Establishing an open-source platform for unravelling the genetics of macadamia: integration of linkage and genome maps.

Data Collection Start Date

2016

Data Collection End Date

2020

Methodology

We have generated an anchored, chromosome-scale genome assembly for M. integrifolia cultivar HAES 741 (4,094 scaffolds, 745 Mb, N50 413 kb) using a combination of high coverage Illumina short and PacBio long read sequences. Scaffolds were anchored to pseudo-chromosomes using seven genetic linkage maps derived from progeny with HAES 741 parentage. This assembly has improved contiguity and coverage, with >120 Gb of new sequence in comparison to the first draft (Nock et al. 2016). Following annotation, a repeat content of 55% and 34,274 protein-coding genes were predicted.

A full description of the process for generating the data is provided in the manuscript “Chromosome-scale assembly and annotation of the macadamia genome”

Coverage

NSW, Australia

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

FoR Code

0706

Viewing Instructions

Annotation files for Macadamia integrifolia, HAES 741 genome assembly. NCBI Bioproject PRJNA593881 1) Protein sequences for 34,274 predicted protein-coding genes [Min_proteins.fa] 2) DNA sequences for 34,274 predicted protein-coding genes [Min_transcripts.fa] 3) General Feature format (GFF) describing genes and other genomic features [Min.gff3] 4) transcriptome assembly [trinity.fa]. A full description of the process for data processing and analysis is provided in the manuscript “Chromosome-scale assembly and annotation of the macadamia genome”. Min_proteins.fa = 12.4 MB, Min_transcripts.fa = 51.3 MB Min.gff3 = 52.3 MB trinity.fa = 325.4 Mb

Share

 
COinS