Service quality of Chinese group package tours in Australia

Document Type


Publication details

Chen, H, Weiler, B, Young, M & Lee, YL 2014, 'Service quality of Chinese group package tours in Australia', in N Scott, D Weaver, S Becken & P Ding (eds), Proceedings of the G20 First East-West Dialogue on Tourism and the Chinese Dream, Gold Coast, Australia, 13-15 November, Griffith University, Queensland, Australia. ISBN: 9781922216649

Abstract available on Open Access

Peer Reviewed



This paper reports the process of developing a measurement for service quality of Chinese GPTs in Australia. Under the Approved Destinations Status (ADS) scheme, Chinese tourists are permitted to undertake leisure travel within a package tour to overseas destinations. In April 1999, Australia became one of the first western countries (along with New Zealand) to obtain ADS from the Chinese government. Since then, Australia has since hosted over 897,000 Chinese tourists undertaking leisure travel in group package tours (GPTs) and China is now Australia’s fastest growing inbound tourism market and largest contributor to international visitor spending in Australia (Department of Resources, Energy and Tourism, 2012). A large proportion of Chinese tourists still visit Australia on GPTs and group tourists is the dominant group compared with independent tourists and business tourists. In the travel agency industry in China, one widely-used approach to measure GPT service quality is customer comment card. However, there have been criticisms on customer comment card, for example, lack of clarity and precision, low return rate and appearance design (Wang, Hsieh, Chou, & Lin, 2007). In academia, the use of popular measures such as SERVQUAL (e.g. Parasuraman, Zeithaml, & Berry, 1988) to measure GPTs has also been criticised. For example, Wang et al. (2007) argued that SERVQUAL does not cover all entities of GPTs and the development of SERVQUAL applied mostly to short-term service encounters, in which the interaction is limited between customers and service providers, while GPT is a long and continuous process. A review of tourism and hospitality literature identifies a number of empirical studies have focused on service quality in the tour operator/travel agency industry. Some researchers measured service quality by replicating or adopting the SERVQUAL model (e.g. Johns, Avcí, & Karatepe, 2004; Lam & Zhang, 1999; Martínez Caro & Martínez García, 2008; Ryan & Cliff, 1997). Other researchers developed their own measurement (e.g. Albrecht, 1992; LeBlanc, 1992; Persia & Gitelson, 1993). A review of studies focusing on travel agencies in China published in English and Chinese identified some studies related to GPT service quality conducted in the broad Chinese (including Taiwanese and Cantonese) context. These attempts include Sheng (1999), Wang, Hsieh, and Huan (2000), Liu & Wu (2006), Wang et al., (2007), Chang (2009), Wang, Ma, Hsu, Jao, & Lin, (2013).

Despite different attempts to find the right measure for service quality, there still appears to be no consensus on how evaluations of quality should be operationalized. This study reports some initial findings on the conceptualisation and measurement of service quality in the context of Chinese group package tours in Australia. The development of an appropriate GPT service quality scale for Chinese GPTs in Australia includes four steps: 1) literature review, 2) panel discussion, 3) survey, and 4) data analysis. Firstly, items were generated from a comprehensive literature review. This literature review includes Chinese-language and English-language studies. To help achieving the aim of a broad study, the following items from the pool of items are reduced: 1) items not within the control of travel agencies, 2) items measuring pre-tour and post tour service, 3) items included great details such as airplane sitting arrangement and 4) items that could have been covered in travel contract.

Secondly, the researcher consulted three experts on the content validity of the remaining items. Group discussions were also conducted with 3 groups (each group consisting of 8, 9 and 10 participants, respectively). Participants have at least one GPT experience in the past 12 months at the time of discussion. Each group was shown the list from literature review and was asked ‘Based on your past experience, what items are needed to evaluate GPT service quality?’ Based on the results from group discussions, the items were further modified and a list of 24 items was produced. Thirdly, 24 items were written as questionnaire from the original pool of items. Each items is anchored by “strongly disagree” (1) to “strongly agree” (5) on a five point rating scale. A performance-minus-expectation paradigm was chosen in terms of recommendation by service quality researchers (e.g. Parasuraman, Zeithaml, & Berry, 1988) and for the purpose to help achieving the research aim of a broad study. To obtain quality data within the research timeframe, an internet panel provider was used to collect data. Experts’ advice on using internet panel providers was used to ensure that only qualified people participate. Data were collected over the period 14 April 2014 to 28 April 2014. A link was sent to all panel members who had outbound experience and GPT experience as recorded in the members’ profile. 1237 responses were received and after eliminating unusable responses (i.e. incomplete responses, responses with short durations), 520 complete responses were retained for further analysis.

Fourthly, exploratory factor analysis was used. The results of the correlation matrix were visually scanned. All values are positive between 0.3 and 0.8. No extremely high or low values have been identified, thus it was decided that all items are retained for further analysis, as Churchill, (1979) suggested. A principle component analysis (PCA) was conducted on the 24 items with orthogonal rotation (varimax). The Kaiser-Myeyer-Olkin measure verified the sampling adequacy for the analysis for the analysis, KMO = 0.93 (‘superb’ according to Field, 2009), and all KMO values for individual items were above 0.832, which is well above the acceptable limit of 0.5 (Field, 2009). Bartletts’s test of sphericity the square of x (276) = 3449.292, p < 0.01, indicating that correlations between items were sufficiently large for PCA. An initial analysis was run to obtain eigenvalues for each component in the data. Four components had eigenvalues over Kaiser’s criterion of 1 and in combination explained 55.97% of the variance. Given the large sample size, and the convergence of the scree plot and Kaiser’s criterion on four components, this is the number of the components that were retained in further analysis. After rotation, the items that cluster on the same components suggest that component 1 represents on tour services, component 2 represents tour leader, component 3 represents attractions and component 4 represents other services.

This study attempted to establish an instrument suitable for evaluations of the service quality of GPTs in Australia. The measurement developed in this study, from its initial stage to the final version, has met rigorous criteria for both validity and reliability. The results have important implications that benefit both practitioners and researchers. However, appropriate adaption of the scale is needed when investigating other types of GPTs or GPTs in other contexts. To extend current research, relationships between GPT quality and other variables such as demographics, satisfaction and behavioural intentions could be explored.

Find in your library