Курс на Stepik
Обложка курса «Bioinformatics Contest 2021» на Stepik
Бесплатно

Bioinformatics Contest 2021 5.000

Открыть на
STEPIK.ORG

SOLVE CHALLENGING PROBLEMS TO WIN Online programming competition

Показатель Текущие показатели Рост
Значение 🏆 Рейтинг 3 дн 7 дн 30 дн
Количество учеников на курсе «Bioinformatics Contest 2021»Учеников на курсе 7 052
Сертификаты, выданные на курсе «Bioinformatics Contest 2021»Сертификатов выдано 0
Отзывы о курсе «Bioinformatics Contest 2021»Отзывов получено 2
Рейтинг курса «Bioinformatics Contest 2021»Рейтинг курса 5.000
Уроки в курсе «Bioinformatics Contest 2021»Количество уроков 12
Тесты в курсе «Bioinformatics Contest 2021»Количество квизов 56
Время прохождения курса «Bioinformatics Contest 2021»Время прохождения курса
Обновления курса «Bioinformatics Contest 2021»Обновления курса
Дата публикации курса «Bioinformatics Contest 2021»Дата публикации курса
Последнее обновление курса «Bioinformatics Contest 2021»Последнее обновление
5.000
из 5
2 отзыва
★★★★★
2
★★★★
0
★★★
0
★★
0
0
Maksym Kovalchuk
Maksym Kovalchuk
4 года назад

In this review, I will share my opinion/solutions to problems, what can be a little bit better, and general impression from the contest. First, problem statements. As I don't have strong biological background it was fun to read and understand problems. I usually read them like 5-10 times before really understand, but that sounds like nearly perfect balance of "bio" and "informatics" in bioinformatics. Second, qualification round. Problem 1 looks good as for the first problem. It can be easily solved with dictionary-like data structure (for example, std::map in c++). Problem 2 and problem 3 are a bit similar in approach to them. They can be confusing to find very fast solution. But what you really want to do is to implement decent solution and parallel it and just wait some time. I implemented two pointers in problem 2 and used O(1) lca (dfs + sparse table) in problem 3. You can easily parallel code in c++ with just 1 line #pragma omp parallel for num_threads(8). Third, final round. Problem 1 is probably my favorite problem among all others. My solution is based on 2 "hints" I took from the statement.Given n pairs of haplotypes, you can create n^2 pairs to increase amount of data as "alleles are inherited ... from each parent" (given n parents make n^2 children). Each unknown position I predict based on the value of 1 known position as "alleles are inherited in consecutive groups ... and thus form a correlated structure" (considering all known positions, choose the best predictor). Problem 2 took a lot of manual work. Not sure if I spend more time implementing some solution or for bin search. Problem 3 is very interesting one. My solution brute-forces every person as superspreader but prediction model is optimized to work faster and not rely on random seed. Problem 4 is also very interesting. Tests 1, 2, 3, and 7 can be solved manually (however I recommend to write some basic solution to decrease manual work). Test 5 have extra easy solution because number of variances >= 16 while expected value of errors = 200*0.01 = 2. For tests 4 and 6 I used the fact that if some site is the same for 3 different genomes yet different from reference, this site is probably taken from minor haplotype. Problem 5 is "implement decent solution and parallel". Fourth, what I would like to see more/less. It probably will be good idea to add optimization problem to qualification round as final round has one. Qualification problem 3 and final problem 5 are solvable with just brute-force. It might be more interesting to make them optimization problems. For example, choose 2 diseases to describe patient for qualification problem 3. It will make brute-force O(n^3) instead of O(n^2). In general, choose k out of n might be good optimization problem, like choose k initial superspreaders or something. For final problem 2 I would like to have 2 types of tests. First type does not has submission limit and has lower score. Second type has like 2-5 submissions per case and has high score. This way it is possible to test ideas safely without giving a lot of points for manual work. I would also like to see more non-workStraightWithGenome problems. Superspreaders is a great example. I believe biology has a lot of such problems. Fifth, overall impression. The contest was prepared really well!!! I like almost everything. I know how hard problem setting is and how important popularization is. It makes me feel so thankful for organizing this contest.

Guilherme de Sena Brandine
Guilherme de Sena Brandine
4 года назад

First of I'd like to say thank you so much for organizing this contest, I had a lot of fun for 24 straight hours trying to build solutions for some of the biology problems you guys created. I think this structure of contest has a lot of potential to bring in more great talent to the bioinformatics field. I will point out that the only downside I found is that some of the problems could be solved without actually writing code. Problems 2 and 4, specifically, could be done by trial and error. I understand that the goal of problem 2 was to write some code to narrow down an interval then do it through binary search, but I think the number of attempts provided was large enough that even if you started with the full interval you'd be able to zero in on the right answer without any code at all. Same for problem 4, where every test case was either a 1 or a 2, so since we had infinite attempts, we could simply swap around 1s and 2s until we got the right answer (I only noticed this like 5 minutes before the end of the contest so I couldn't get all the points for it). In contrast, problems 1 3 and 5 had large enough test cases that this was infeasible, so my only suggestion would be to increase the number of test cases enough so that we wouldn't be able to manually change our answers until we get it right, since this takes away from algorithmic design and turns the problem into a "patience contest". Overall, however, this was a very fun experience and my hat goes of to the organization team. I do not take for granted the amount of effort required to make good test cases and clear problem statements.