☰ Advanced Methods in Data Analysis
Instructor
Prof.dr. Horia F. Pop, Email:
Preliminaries
This is a research oriented class. Your grade will be based on your own work and on your understanding of it, including your ability to explain, defend and analyse your work and your results.
Objectives
- To introduce the student in advanced methods of data analysis
- To present the field of intelligent data analysis as a novel research and application domain.
- To induce the necessity of intelligent data analysis methods by studying some relevant practical applications
- To offer the student the instruments that will allow him/her to develop different data analysis applications.
Class activities
- All activities require physical class participation. This is a full attendance, not a distance learning program.
- According to the National Education Act (1/2011), the recording of didactical activity by any means is only possible by explicit agreement of the teaching person. Consequently, no recording of any didactical activity, by any means and on any support, is allowed.
- This is a research oriented class. Your grade will be based on your own work and on your understanding of it, including your ability to explain, defend and analyse your work and your results.
- The students are invited to contact me individually, by email, at their own initiative, for any clarifications, explanations, consultations on class-relevant topics, research-relevant topics, or for required support or supervision for class-based activities.
Schedule of activities
The Teams class access code is btf1lae.
| Week | Lectures | Seminars |
| 1 | Class Administration, Organization, Introduction |
| 2 |
Fuzzy sets, Fuzzy logic
Fuzzy reasoning, Fuzzy control systems
Fuzzy clustering, Quality measures
PCA, Discriminant analysis, Regression
Rough sets, Decision tables, Decision trees
Applications of data analysis and fuzzy sets
| |
| 3 |
| 4 |
| 5 |
| 6 | Report 1 |
| 7 |
| 8 |
| 9 |
| 10 | Report 2 |
| 11 |
| 12 |
| 14 |
- Week 13 - national holiday; no classes
Online resources
- Lecture notes and other resources are available on the UBB Sharepoint platform.
- Please understand that the lecture notes will be posted here AFTER the class.
Bibliography
- J. Han, M. Kamber, Data Mining: Concepts and Techniques, Academic Press, 2001
- G.J. Klir, B. Yuan, Fuzzy Sets and Fuzzy Logic, Prentice Hall, 1995
- T. Mitchell, Machine Learning, McGraw Hill, 1996
- Z. Pawlak, Rough Sets, Polish Academy of Sciences, Gliwice, 2004
- N. Ye, The Handbook of Data Mining, Lawrence Elbaum Associates Publishers, 2003
Optional bibliography
- A. Agresti, An Introduction to Categorical Data Analysis, Wiley, New York, 1996
- M. Barthold, D.J. Hand, Intelligent Data Analysis, Springer Verlag, 2003
- J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algoritms, Plenum Press, New York, 1981
- C. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1995
- Y.H. Pao, Adaptive pattern recognition and neural networks, Addison Wesley, 1989
- Statsoft inc., Electronic Statistics Textbook, Tulsa, OK, 2006, Website
- Open internet resources
Libraries with scientific articles
Collections with datasets for experimentation
Grading scheme
- 30% = First report (written and presented)
- 30% = Second report (written and presented)
- 40% = Final exam paper (written paper in the exams session)
Minimal requirements:
- A minimal grade average of 5 (five).
- A minimal grade of 5 (five) at the final exam.
- At least one report submitted during the semester.
- Any evaluation of the submitted materials is done during the examination session.
Rules relevance:
- All rules are equally valid for all students.
- There are no exceptions to these rules.
Students deliverables
You will work on your own preferred research topics. They have to be relevant to the general class topic (i.e. Data Analysis), but do not need to refer strictly to topics from the lectures.
First report
- deadline for setting the time slot = week 5, Monday 00:00 hours
- the time slot is scheduled via email on a first come first served basis, according to available slots
- the written report is send via this link as a ZIP archive including all necessary files
- if the upload system allows a ZIP archive, the file will be named r1-studentname.ZIP and uploaded as such
- if the upload system does not allow a ZIP archive, the file will be named r1-studentname.ZIP.PDF and uploaded as such
- class presentations = weeks 6-9
- the reports presentation schedule is published read-only and updated on the UBB Sharepoint platform; please do not ask for write permissions
Second report
- deadline for setting the time slot = week 9, Monday 00:00 hours
- the time slot is scheduled via email on a first come first served basis, according to available slots
- the written report is send via this link as a ZIP archive including all necessary files
- if the upload system allows a ZIP archive, the file will be named r2-studentname.ZIP and uploaded as such
- if the upload system does not allow a ZIP archive, the file will be named r2-studentname.ZIP.PDF and uploaded as such
- class presentations = week 10-14
- the reports presentation schedule is published read-only and updated on the UBB Sharepoint platform; please do not ask for write permissions
Students activity
Experimental studies
- Each student will select particular real-world problems whose solving implies the use of data analysis algorithms, problem that considered to be interesting and/or important, but not an artificially created problem.
- The student will report on his/her own actual and real experiments. This is not a collective work, and any assistance other than the student's own mind is not welcome. The reports will convincingly show that the experiments and their analyses have actually been performed by the student. The reports will include the personal motivation for all the choices made.
- Explain how the problem is approached and solved using the data analysis algorithms of your choice. You will need to present all the experimental details in full clarity, a critical analysis of the whole problem+solution setup, including your own understanding of the AI modelling of the problem, the relevance of the research method used, qualitative and quantitative analysis of the experimental results obtained, etc.
- Any report that contains plain text with no supporting floating bodies will be rejected. This is not an essay on English language and literature.
First experimental study shows experiments on one method involving more different data sets
Second experimental study shows experiments on one data set involving more different methods
- The two experimental studies will use different data sets.
- If the first experimental study does not use more different data sets (i.e. from different topics), it will be rejected.
- If the second experimental study will use a data set that is not completely different to any of the data sets used in the first experimental study, then the second experimental study will be rejected.
Grading of the research reports
- Any report will have 4,500-5,000 words and 5-10 references, will be written in Microsoft Word (or Word online) as a DOCX file, formatted as A4, with Times New Roman 12 pt fonts, single line spacing, 2 cm wide margins.
- For the purpose of these reports, you will use Microsoft Word or compatible software. LaTeX and any other tools not leading to a DOCX file are irrelevant. This is not negociable.
- Any report not satisfying these requirements will be rejected.
You will prepare and submit the following:
- the research paper, as one DOCX file
- one page executive summary, as one DOCX file
- the presentation support, as one PDF file
All these will be submitted packed as one ZIP archive. If the upload system does not allow a ZIP archive, the file will be renamed as filename.ZIP.PDF and uploaded as such.
The grading is done as follows:
- 7.0 p = quality of written materials;
- 3.0 p = quality of oral presentation.
The grade is composed by taking into account the following:
- the papers have to represent your own work;
- the title and contents have to match the requirements;
- the papers have to fulfill the requirements of an article:
- length of 4,500-5,000;
- suggestive title corresponding to the contents;
- 10 lines abstract;
- introductory section, detailing the purpose of the paper;
- a section integrating the topic of the paper in the general field;
- a few main sections, according to your topic;
- concluding remarks section;
- bibliography of five to 10 titles; the bibliography entries have to be written correctly and completely; all the bibliography items have to be cited in the text;
- the presentation support has to correspond to the written text; any information from the presentation support has to be provided in the report itself;
Grade penalties
Complying with the requirements
- Failure to comply with the requirements leads to rejection of the report.
Penalties apply for delays
- Delays in time slot selections:
- If the time slot is selected after the deadline, the penalty is the number of weeks between the deadline and the selection date.
- If the time slot is not selected at all, any report submitted is ignored.
- All emails sent to select a time slot are confirmed. If you get no confirmation, this means I did not receive your email.
- Delays in submissions of work:
- The earliest day a report may be presented is one day after the report is submitted.
- Multiple submissions are possible, but the last submission before the actual presentation will be considered.
- All submissions sent on or after the presentation date will be, of course, ignored.
- While multiple submissions are allowed, repeated uploads at intervals of few minutes are discouraged.
- If the last submission of the report is done on or after the scheduled presentation date, the penalty is the number of weeks between the scheduled date and the submission date.
- This penalty penalizes the extra amount of time actually required to complete the work. Such penalty refers to calendar weeks, which include, naturally, Saturdays, Sundays, National Holidays, the two weeks Christmas and New Year Holiday, the Easter Holiday, etc.
Further considerations
On reports contents
- The writing, expressing, coverage, and points of view of the presented ideas have to be your own work.
- The burden of proof stays with the writer, not the reader: the text has to confirm beyond any reasonable doubt that the student actually did the experiments s/he is commenting on.
- You cannot submit (parts of) the same work for different reports and/or different disciplines, in order to get different grades. Naturally, different disciplines mean different work.
- There is no exception to the allowed size of the reports.
On reports submission
- All reports will be submitted via the provided links.
- All work should be done inside the 14 weeks of the semester. The submission deadlines respect students holidays. Delays past the submission deadlines are offered as an exception, to allow for partial grading.
- Overlap of delays into holidays are, obviously, not an invitation for free extension of said deadlines. Consequently, all delays are penalized based on calendar time.
On reports presentation
- The reports not presented according to the schedule may be presented later after all the normally scheduled reports are presented and only with the strict observance of the official schedule.
- The points for the quality of the oral presentation are awarded (in full or in part) only if there is an actual oral presentation.
- No student may have two presentations on the same date.
Examination sessions
Activity outside the semester
- All research reports / software projects are part of the semester activity and have to be submitted inside the 14 weeks period, i.e. by the last Friday of the semester.
- No report or software project can be resubmitted and no activity can be redone for grade increase in the resit session or otherwise.
The regular examination session
- According to academic regulations, two exam dates are set:
- the first date is set by agreement with the students;
- the second date is set by the teacher alone.
- The attendance regulations are:
- the students have to attend the first date;
- the students may attend the second date at the discretion of the teacher.
The resit session
- According to academic regulations, one exam date is set.
- Only the written exam will be organized. All other activity is semester activity, and not subject to regrading.
© Prof.dr. Horia F. Pop