Big data technology in education

The article discusses the implementation of big data in the educational process of higher education. The authors, analyzing a large amount of data, referring to the types of services provided by e-government, indicate that there are many pressing problems, many services are not yet automated. In order to improve the professional training of teachers of Computer Science of the L.N. Gumilyov Eurasian National University, educational programs and courses have been developed 7M01514 — «Smart City technologies», «Big Data and cloud computing» and 7М01525 — «STEM-Education», «The Internet of Things and Intelligent Systems «on the theoretical and practical foundations of big data and introduced into the educational process. The article discusses several types of programs for teaching big data and analyzes data on the implementation of big data in some educational institutions. For the introduction and implementation of special courses in the educational process in the areas of magistracy in the educational program Computer Science, the curriculum, educational and methodological complex, digital educational resources are considered, as well as hardware and software that collects, stores, sorts big data, well as the introduction into the educational process of theoretical foundations and methods of using the developed technical and technological equipment.


The formation of information civilization has led to the emergence of a new digital environment in society. Since digital environment itself is constantly developing, along with the possibility of mastering ready-made information, mastering and applying new technologies that process it, some technologies have the ability of generating even more important new information, which requires periodic solutions. These are future specialists, in our case, the requirements for the level of thinking, cultural development of specialists in information and communication technologies are sharply increasing.

In the article of 3, ‘analysis on current situation' of the State Program ‘Digital Kazakhstan' discussed that although there are many IT companies which can develop national-wide projects, the low efficiency of the established development institutions in the Information Technology segment, the existing technology parks in the country cannot operate properly. Moreover, there are some statistics that, there are more than 40 million types of public services in recent years and the number of its users exceeded 6.6 million people. Also, more than 83 mobile services including 1414 free call centers were opened, and around 14– 15 thousand complaints are received per day through on the given services. 14,928 budget documents were published, etc.

Citing such achievements, the program notes that there are still many non-automated activities, including all relevant issues considering the lack of transparency and accountability, customer orientation, insufficient level of activity.

The program also reveals the possibility that the emergence of new technologies will allow us to provide services of higher quality than those currently implemented, in particular, the use of big data technology will lead to a fundamentally different approach to analyze the needs of the population, and, in turn, to an increase in the quality of services [1]. The mentioned issues trigger the new requirement in preparing IT-students.

A large amount of data in education mostly contains information about large number of students and thousands of educational institutions. A large amount of data increases even more over time. The speed of change in big data allows users to monitor the learning process by using specific applications.

The main purpose of our proposed work is to identify the theoretical framework of Big Data and implement it in practice.

Literature review

The head of AlgoMost, Leviev M. identified the 5 types of data in education for adapting Big Data technologies, as follows:

  •  student's personal information
  •  data on students ' interaction with e-learning systems (electronic textbooks, online courses)
  •  information on the effectiveness of training materials
  •  administrative (system-wide) data
  •  predicted data.

The author also cites Big Data in US universities and the effectiveness of working with them as an example. According to him, 400,000 students are expelled annually with the reason that many students take out loans to pay for their studies. Failure to pay will lead to a deterioration in your credit history. When a student leaves school, it leads to a decrease in the overall finances of the educational institution and negatively affects the rating of the university as whole. Therefore, during organizing support from the government, the use of Big Data Technologies brings efficient effects. Besides, he emphasizes the importance of Big Data Technology in solving other problems of students, in particular, it proves that students ' completion of the course increased by 16 %, while their progress to the next course saw a growth of 8 % [2].

The importance of using Big Data Technologies in the processing of students who have dropped out of training and students who have fully completed their studies is mentioned in Moscoso-Zea O., Vizcaino M's work.

The authors also note that in the development of data analytics, Intelligent Data Mining (IDM) is a developing field that supports the implementation of changes in higher education institutions' management, and IDM highlights the importance of considering the use of data mining methods and algorithms for information stored in the academic databases [3].

The basis of IDM technology is aimed at understanding the behaviour and personal qualities of students [4]. It is used in education to build the competence of future graduates including the next questions:

  •  Which subject is the most difficult?
  •  What tests the most efficient assignment activities?
  •  What topics are students interested in?
  •  How to organize the effective compilation of the curriculum?

Another example of using EDM technology is to determine the grades and employment status of graduates by criteria [5]. At the same time, the authors note the importance of using this technology in elearning. One of the main problems in e-learning is the presence of new relationships and new knowledge (data mining) in big data, which often takes place in a hidden form. It is noted that the educational activities in universities in Azerbaijan are managed by data mining methods, where teachers receive information about students on time and reply promptly to changes in the educational process. Quick reactions in any process which allow IMD technologies can be considered as its advantage.

«Digital technologies in educational institutions are not only online courses, electronic textbooks, and computer testing, but also ‘big data is a new opportunity in monitoring the whole educational process' said the author at the III International Conference (2019) with the participation of the government of Moscow and educational authorities. Wide applications of Big Data in analytical systems, in evaluating universities' quality assurance and accreditation, transport, healthcare, artificial intelligence (AI) systems are noted in the conference materials as well [6].

Scientists from the Vyatsk State University, Russia define the Big Data as a technology for analysing the education system, which involves measuring, collecting, analysing, and presenting a large amount of structured and unstructured data about students and the educational environment to understand the specifics of the functioning and development of the education system. In their research, the methodological basis is the formalization of technology for working with big data, aimed at developing the education system by identifying patterns that have developed in the education system [7].

By analyzing various models and methods scientists Frumin I.D., Dobryakova M.S. proposes three areas for application of Big Data Technology in education:

  1. Thinking competence (primarily critical thinking);
  2. Competence in communication with others (communication and collaboration-cooperation);
  3. Competence of self-government (self-regulation, reflection, and self-organization) [8].

Taking all things into consideration we have decided to work on following Big Data — related materials as follows, Artificial intelligence (AI), mobile devices, social networks, and the Internet of Things (IoT) which allow users to make data more complex thanks to new objects and data sources. Big data comes from sensors, various other devices, video/audio, online data, transactional applications, the internet, and social networks, most of which will be very large data in real-time.

Big data or Big Data Technology has been recognized in the world since the 2000s. The concept of «Big Data Technologies», which appeared relatively recently in the field of Information Technology was introduced by Cliff-Ford, editor of the journal Nature. He wrote about the progressive increase in the volume of world information, noting that new tools and advanced technologies help to master it [9].

Some of the most common definitions of Big Data are given by various researchers. While it is defined as a group of methods and technologies for processing dynamically growing amounts of data by Information Technologies [10], Big data is also classified according to the type of data source (Internet or media sources), content format (structured, semi-structured, and unstructured), data storage method, etc. [11].

For the first time, leading Gartner analyst Doug Lani noted that in the «VVV» in 2001, the main characteristics of Big Data Technologies are variety, volume, and speed. While different types of data whether it is unstructured or partially structured with different formats considered as a variation characteristic, the volume of data is measured by actual numbers of documents [12].

The advent of global technological capabilities affects the analysis of large arrays of big data. While it took scientists several decades to collect data while doing research, in the modern era of IT technologies, it is possible to collect the necessary information in a short time. With the help of big data processing technologies, it is possible to simplify the work of processing large data in the educational process.

There is a lot of information on the Internet about Big Data, and many people confuse this term with databases. It is argued that the constant updating and, accordingly, increasing and increasing of big data is not in the difficulty of accumulating them, but in processing them.

The president of the Association of Big Data and Analytics said in an interview with the CEO of Marketo: «we have been working in this field for more than 10 years. We are one of the few companies that started Big Data not only in Kazakhstan but also in the CIS entirely. Now we work mainly with foreign companies. Among them are the United States, Switzerland, and China. For example, now we are working with 3 schools in the United States. According to this project, 3D video cameras will be installed at the entrance to the school. They provide information about the mood, emotions, and psychological state of children by capturing the children they enter. This is the only project that helps to keep children calm and control their behaviour at school. The programs «Digital Kazakhstan» and «Smart City» in our country hold a huge amount of data» [13].

To store large amounts of data, a large amount of disk memory is required. To solve this problem, NAS (Network Area Storage) cluster technology is used, which provides the connection of information storage devices to a local or distributed computer network directly over the TCP /IP protocol. This network allows users to store files on NAS (Network Area Storage) servers and share them via the browser or its network address. The NAS cluster infrastructure consists of several interconnected information storage devices that allow users to share and search for available information[14].

Given the growing amount of information in the world, it is not surprising that soon, if necessary, it will be possible to add new disks, that is, additional efficient and inexpensive storage devices. However, with the emergence and functioning of the terms cloud technologies, large-scale data analysis, and the Internet of things, it is clear that a new approach is needed both to access data in real-time and to optimize data in warehouses. Modern cloud technologies that support storage and software requirements provide users with storage optimization, security, flexible delivery methods, and large-scale infrastructure. Clouds can contain not only large amounts of raw data but also data in the original format. New technologies allow you to process them when necessary. For example, Hadoop, developed using Java, allows analysts to store large amounts of data by placing it on servers, and then processing the data using MapReduce on a Java Virtual Machine (JVM)[15].

The CISCO Networking Academy resources provide examples of large-scale data with examples of- modern, existing techniques and technologies.

  •  It is said that sensors in an unmanned vehicle can accumulate up to 4,000 gigabits (Gbit/s) of data per day;
  •  the A380 Airbus flight from London to Singapore is 1 petabyte of data;
  •  the safety sensors used in the mine can output data up to 2.4 terabits per minute;
  •  sensors in a single smart home can collect up to 1 gigabyte of information per week [16].

In the context of the pandemic which has been spreading around the world, the necessity of big Data was seen more than ever. It is associated with many unresolved issues, such as the number of students, educational and methodological materials, various types of digital data, various documents and text data, the effective organization, storage, processing, etc. For example, buildings of the L.N. Gumilyov Eurasian National University are equipped with sensors that monitor access. The same sensors collect 1.2 GB of information per month. For the certain period of time, this data can be considered big data for our university.

Another example is the Kazakhstani bank sectors which have installed special terminals for obtaining small loans.

The PIN is entered in the terminal, the amount received is entered, and after the consumer is identified, the processing is processed. The use of big data in medicine also makes the whole medical procedure easier. the availability of free medical quotas, their sufficiency, and comparison to different medical examinations can be checked readily with the help of Big Data technologies.

All data must be combined into a single system to work with Big Data.

The professional development of specialists and their training in the field of Big Data is a matter of time. The solution to these issues will be considered in our work. Meanwhile, in article 3.4 of the State Program ‘Digital Kazakhstan', ‘human capital development' is discussed that ‘digitalization significantly outstrips the existing system of production requirements for the composition of specialties covered by the labour market. The lack of operational communication between the labour market and the education system can simultaneously lead to the training of unclaimed personnel and the release of personnel in «dying» specialties. It is necessary to completely revise the content of all levels of education through the development of digital skills of all specialists. Therefore, we notice the great demands of highly trained specialists in the current education sector [17].

The adaptation of these requirements can be correlated according to the law of the Republic of Kazakhstan «on Education» [18], the state compulsory education standard for higher education in Kazakhstan [19], and other state educational programs.


It is well known that human capital is the main factor in the formation and development of an innovative economy and education as the next high stage of development, and is associated with health, intelligence, high — quality productive work, and quality of life. We will consider the implementation of our work in the university within the framework of these programs and concepts.

In the educational program 7M01514 — «Smart city technologies», special course as Big Data and cloud computing and in the educational program 7M01525 — ‘STEM-Education' ‘the Internet of Things and Intelligent Systems' discipline was introduced to teach both theoretical and practical basis of Big Data. This course is 5 credits, including 1 credit for lectures, 2 credits for practical work, as well as Students Individual work (SIW) are provided on demand from 2 credits.

The modules considered in the working curriculum of the special course provide theoretical foundations of large-scale data and methods for creating, storing, and processing large-scale data.

Working with large amounts of data — working with a data set that goes beyond the capabilities of traditional relational databases to collect, manage, and process large amounts of data. Together with working in an SQL environment with a well-known relational database, it is planned to implement the use of the NoSQL environment in the educational process for processing, storing, and managing non-relational databases.

The description of large amounts of data is based on the following concepts:

  •  large volume for data storage;
  •  increase in data volume in geometric progression;
  •  creation, storage, and processing of data in various formats;
  •  high speed, etc.

The second module of the special discipline examines the theoretical foundations of big data processing, software environments for their processing. In our case, the programming languages Python and R are used. On the theoretical and practical basis, the principles of operation and functionality of data collection sensors are considered.

The next module of the special discipline presents methods for creating, storing, and processing large amounts of data. The Faculty of Information Technologies of the Eurasian National University named after L.N. Gumilyov is working on the STEM (Science, Technology, Engineering, and Math) program in education within the framework of the Erasmus project In connection with the study of large amounts of data at the University. An educational program has been developed to implement this discipline. Several types of programming software are considered, including R (a programming language for statistical processing of data and working with graphics) and Python (a high — levelled programming language), and they were included as special courses named ‘programming in Python' and ‘programming in R'. The general content of these special courses covers the following topics: directions and prospects for the development of big data; application areas of big data; methods of Big Data Analysis; Big Data Processing Technologies; data processing via SQL and NoSQL databases; MapReduce in Big Data Processing, Hadoop location, hardware and software environments for Big Data Processing; Cloud Computing and its description of Big Data Processing; use of Microsoft Azure resources for high-performance computing; the use of the Microsoft SQL Server Database Management System and the Microsoft SQL Server Management Studio tool in the creation of servers and databases in Microsoft Azure in remote data organization, data security, including data security in cloud computing.

Besides, students' independent work (SIW) is provided with project defences on the following topics: the main advantages and disadvantages and problems of big data, designing the logical structure of big data using analysis methods, organizing big data using modern technologies, using big data in our life, the structure of cloud computing and the relationship between cloud computing and big data, software capabilities of cloud resources, the basics of comparing and processing relational and non-relational data, processing big data in the cloud, etc.


In the practical works there given objectives are set in the process of collecting, processing, and sorting big data in the educational process. As follows:

  •  improving students' competencies in mastering the technology of processing large scale;
  •  improving students ' competencies in considering software languages for big data;
  •  giving a description of the programming languages Python and NoSQL and to improve students ' competencies in programming large arrays of data;
  •  identifying the basics of collecting, sorting, and analysing data conducted through turnstiles in university buildings, and improve their competence on the topic by mastering the hardware and software basics in the course of classes;
  •  improving the competence of students in collecting, sorting, and analysing data on weather forecasting using solar panels;
  •  Formation of students' skills and abilities in analysing, designing, and compiling data collected in Ar- duino;

It is worth mentioning that the Platonus System and Library Fund are the large databases of processed and sorted data at Eurasian National University. Students also are working on building such projects in their practical work in certain courses.

Meanwhile, doctoral work on working with big data using the blockchain structure is on the way.


In conclusion, today there is a need to monitor and supplement the content of education in the training of specialists following scientific, technical, technological, and information changes.

Modern information technologies are widely used in the training of university students. One of the main conditions for achieving the planned result through the use of modern information technologies in training is the completely safe and productive functioning of the IT infrastructure. The volume of processed information is constantly increasing, and the requirements for improving productivity and security are growing as well. The most effective way to meet these requirements is to develop information technologies for educating and training students.

The application of Big data in the education sector is expanding these days. Big data comes from sensors, various other devices, video/audio, online data, transactional applications, the internet, and social networks, most of which will be very large data in real-time. Although the types of services are provided by the government, still there are many pressing problems as many services are not automated yet. As a remedy for the given issue, educational programs and courses have been developed at L.N. Gumilyov Eurasian National University. For example, «Big Data and cloud computing», «STEM-Education», «The Internet of Things and Intelligent Systems» courses were introduced to adopt big data in both theoretical and practical terms. The article discusses the implementation of big data in the educational process of higher education.

The course of training with a specially developed content, prepared for implementation in the educational process will form students' theoretical knowledge, skills, and practical skills on a large amount of data and, in turn, will make a great contribution to the formation of a competitive specialist.



  1. Zakon RK «Ob obrazovaniiı » ot 27 iiulia 2007 h. № 319-III [The Law of the Republic of Kazakhstan «On education», dated 27 July, 2007, No. 319-III.] Legal reference «Legislation». Astana: Publishing house «Lawyer» [in Russian].
  2. Leviev, M. (2015). 5 sposobov primenit Big Data [5 ways to apply Big Data] Retrieved from [in Russian].
  3. Moscoso-Zea, O., Vizcaino, M., Lujan-Mora, S. (2017) Evaluation of methods and algorithms data mining // 7th Research in Engineering Education Symposium, 972–982.
  4. Bishop, Ch. (2006) Pattern Recognition and Machine Learning. Information Science and Statictisc. Vol. 10, 740.
  5. Mamedova, G.A., Zeinalova, L.A., & Melikova, R.T. (2017). Tekhnolohii bolshikh dannykh v eletronnom obrazovanii [Big data technologies in e-education]. Institut informatsionnykh tekhnolohii Natsionalnoi akademii nauk Azerbaidzhana — Institute of Information technologies of the national Academy of Sciences of Azerbaijan.Vol.21, 6, 41–48 [in Russian].
  6. Bolshie dannye v obrazovanii [Big data in education]. (n.d.). Retrieved from [in Russian].
  7. Utemisov, V.V., & Gorev, P.M. (2018). Razvitie obrazovatelnykh sistem na osnove Big Data [Development of educational systems based on Big Data]. Pedahohicheskie nauki — Pedagogical Sciences 6, 449–460 [in Russian].
  8. Frumin, I.D., Dobriakova, M.S., & Barannikov, K.A. (2018). Universalnye kompetentnosti i novaia hramotnost [Universal competencies and new literacy]. Pedahohicheskie nauki — Pedagogical Sciences, 2(19), 28–36 [in Russian].
  9. Cherniak, L. (2011). Bolshie dannye — novaia teoriia i praktika [Big data-new theory and practice] Otkrytye sistemy. SUBD — Open system. DBMS. 10, 18–25 [in Russian].
  10. Alekseev, M. (2014). Big Data — revoliutsiia v oblasti khraneniia i obrabotki dannykh [Big Data — revolution in data storage and processing]. Retrieved from www. slideshare. net [in Russian].
  11. Ibrahim AbakerTargioHashem, IbrarYaqoob, & Nor BadrulAnuar. (2015). The rise of «big data» oncloud computing: Review and open researchissues — Information Systems, 98–115.
  12. Laney, D. (2001). 3 data management: Controlling data volume, velocity and variety. Retrieved from douglaney/files/2012/01/ad949.pdf.
  13. Informatsiia, neobkhodimaia chitateliu: voprosy i otvety [Information the reader needs: questions and answers]. (n.d.) Retrieved from [in Russian].
  14. Ferguson, R. (2012). Learning analytics: Drivers, developments and challenges. — International Journalof Technology Enhanced Learning, 4(5/6), 304–317.
  15. Asha, T, Shravanth, i U.M, & Nagashree, N. (2013). Building Machine Learning Algorithms on Hadoop for Bigdata. — International Journal of Engineering and Technology, Vol. 3, 2, 143–147.
  16. Vvedenie v seti. Networking Cisco Academy [Introduction to the network. Networking Cisco Academy]. (n.d.) https://static- Retrieved from [in Russian].
  17. Postanovlenie Pravitelstva Respubliki Kazakhstan ob utverzhdenii Hosudarstvennoi prohrammy «Tsifrovoi Kazakhstan» [Resolution of the Government og the Republic of Kazakhstan on the approval of the State Program «Digital Kazakhstan»] (n.d.). Retrieved from [in Russian].
  18. Zakon Respubliki Kazakhstan ot 27 iiulia 2007 h. № 319-III «Ob obrazovanii» [The Law of the Republic of Kazakhstan 27 iiyulıa 2007 h. № 319-III «About education»] (2016). Pravovoi spravochnik «Zakonodatelstvo» — Legal guide «Legislation» [in Russian].
  19. Hosudarstvennyi obshcheobiazatelnyi standart vyssheho obrazovaniia, utverzhdennyi Postanovleniem Pravitelstva Respubliki Kazakhstan ot 23 avhusta 2012 h. № 1080. Prilozhenie 7 k Prikazu ministra obrazovaniia i nauki Respubliki Kazakhstan ot 31 oktiabria 2018 h. № 604. [Compulsory state standard of higher education, approved by the Government of the Republic of Kazakhstan, dated 23August, 2012, No.1080. Appendix 7 to the Order of the Minister of Education and Science of the Republic of Kazakhstan dated 31 October, 2018 No. 604] [in Russian].
Year: 2020
City: Karaganda
Category: Pedagogy