k近邻法(k-NN)笔记3- 第三方实现(FLANN库)

笔者我查阅了很多kdtree的第三方实现,下载编译并调试了许多同类型代码,结合个人喜好和性能结果,推荐PCL点云库中的FLANN模块。
PCL库是大型跨平台开源C++编程库,实现了大量点云相关的通用算法和高效数据结构,划分了许多模块。其中有一个核心问题就是建立离散点间的拓扑关系,实现基于邻域关系的快速查找。具体详情请自行查阅,接下来我们展示其中的FLANN库(opencv也整合了这部分)。

    FLANN的全名:Fast Library for Approximate Nearest Neighbors,是一个在高维空间快速搜索近邻的库。主要优势:实现了包含kd-tree等在内的搜索算法集合,并且可以根据数据集本身特点,自动选取最适合的算法来完成k近邻搜索任务(更加确切的说是高维特征向量的匹配),提供了c++cpythonmatlab的接口。

Note

    如果是为了学习,在看这篇文章之前,首先确保《统计学习方法》中第3章例题和习题都可以手动解出。如果单纯是为了调用一下k近邻方法解决某些问题,那么继续接下来的内容。

1. FLANN介绍地址

    http://www.cs.ubc.ca/research/flann/

库下载地址:此处下载

前几次都是在windows上弄,这次就换换口味,算法什么的还是在linux上弄吧(我使用ubuntu14.04)。

 2.下载编译

    下载后(或者直接用git:git clone git://github.com/mariusmuja/flann.git)解压,进入主目录,接下来是最常见步骤。

mkdir build

cd build

cmake  ..

make

    插一句,cmake时会检测环境和配置,有些库检测不到或者系统不支持某些特性,都有可能会导致后面有些调用或者test有问题,大家心中要有数,不要以为所有test没过,这个第三方库就一定不能用了,我们选择自己需要的就好。当然,有些必要依赖库没有的也可以检测出,我们需要自己补上。

我cmake的部分截图如下:

可看出我的GNU是4.84版本

从中看出我没有安装gtest库,也没有安装matlab,这些暂且不管,无伤大雅,gtest库大家有时间可以自己装装(test是一个跨平台的C++单元测试框架,它提供了丰富的断言、致命和非致命判断、参数化、”死亡测试”等等)

make注意部分:

笔者在两个平台上都make了一下,发现在下图部分定住了,这是正在编译中,不是挂了~~

编译完成。

3.测试

    测试代码的作用:读取dataset.dat内容,每行数据表示一个多维特征向量,读取完毕后根据选取的算法对多维数据建立索引(比如kd树),对于每行的多维特征向量,找到dataset中与其近邻的nn个点索引,存入results.dat

(1)测试代码分析(我以.c为例,.cpp python等类似)


#include <flann/flann.h>  //引入flann

#include <stdio.h>
#include <stdlib.h>

//读取文件,需要事先指定内容的行和列,注意内部使用了malloc,需要在调用函数后释放内存
float* read_points(const char* filename, int rows, int cols)  
{
	float* data;
	float *p;
	FILE* fin;
	int i,j;

    fin = fopen(filename,"r");
    if (!fin) {
        printf("Cannot open input file.\n");
        exit(1);
    }
    
    data = (float*) malloc(rows*cols*sizeof(float));
    if (!data) {
        printf("Cannot allocate memory.\n");
        exit(1);
    }
    p = data;
    
    for (i=0;i<rows;++i) {
        for (j=0;j<cols;++j) {
            fscanf(fin,"%g ",p);
            p++;
        }
    }
    
    fclose(fin);
    
    return data;
}
//把找到的k近邻索引写入文件,rows传入搜索点的个数,cols传入k
void write_results(const char* filename, int *data, int rows, int cols)
{
	FILE* fout;
	int* p;
	int i,j;

    fout = fopen(filename,"w");
    if (!fout) {
        printf("Cannot open output file.\n");
        exit(1);
    }
    
    p = data;
    for (i=0;i<rows;++i) {
        for (j=0;j<cols;++j) {
            fprintf(fout,"%d ",*p);
            p++;
        }
        fprintf(fout,"\n");
    }
    fclose(fout);
}



int main(int argc, char** argv)
{
	float* dataset;    //训练数据集指针
	float* testset;	   //测试数据集指针
	int nn;
	int* result;
	float* dists;   //距离指针
	struct FLANNParameters p;  //FLANN参数结构,可以在里面选定具体搜索算法(比如kdtree)
	float speedup;
	flann_index_t index_id;   //索引结构

    int rows = 9000;
    int cols = 128;
    int tcount = 1000;

    /*
     * The files dataset.dat and testset.dat can be downloaded from:(链接已经失效)
     * http://people.cs.ubc.ca/~mariusm/uploads/FLANN/datasets/dataset.dat
     * http://people.cs.ubc.ca/~mariusm/uploads/FLANN/datasets/testset.dat
     */
    printf("Reading input data file.\n");
    dataset = read_points("dataset.dat", rows, cols);
    printf("Reading test data file.\n");
    testset = read_points("testset.dat", tcount, cols);
    
    nn = 3;  //k值,即需要搜索多少个近邻
    result = (int*) malloc(tcount*nn*sizeof(int));   //指向tcount个搜索点的nn个近邻索引
    dists = (float*) malloc(tcount*nn*sizeof(float));//指向tcount个搜索点的nn个距离值
    
    p = DEFAULT_FLANN_PARAMETERS;
    p.algorithm = FLANN_INDEX_KDTREE;  //此处可以选择搜索算法,包括自动选取
    p.trees = 8;
    p.log_level = FLANN_LOG_INFO;
	p.checks = 64;
    
    printf("Computing index.\n");
    index_id = flann_build_index(dataset, rows, cols, &speedup, &p);  //重点,建立索引
    flann_find_nearest_neighbors_index(index_id, testset, tcount, result, dists, nn, &p);//搜索k近邻
    //索引值存入文件
    write_results("results.dat",result, tcount, nn);
    //释放内存
    flann_free_index(index_id, &p);
    free(dataset);
    free(testset);
    free(result);
    free(dists);
    
    return 0;
}

注意:.c接口保存在results.dat中的索引值针对的是dataset.bat中的排列顺序

(2)数据

官网和文件中测试程序使用的数据集下载地址已经失效,笔者花费很大气力终于搞到并上传到自己的个人网站:此处下载

(3)运行

如图把数据集放到运行程序工作目录下,执行flann_example_c后生成results.dat文件。 查看文件部分内容:

含义:以第一行为例,4092 3928 5701表示测试的第一个多维特征向量在dataset数据集中的3个近邻索引(对应到dataset行数-1)

4.场景实例

使用FLANN库,采用k近邻算法识别图像。具体参见下篇文章

k近邻法(k-NN)笔记4-利用FLANN库(k近邻法)解决图像识别问题实例解析

“k近邻法(k-NN)笔记3- 第三方实现(FLANN库)”的2,248个回复

  1. Hey! I just wanted to ask if you ever have any trouble with hackers?
    My last blog (wordpress) was hacked and I ended up losing a few months of hard work due to
    no back up. Do you have any methods to stop hackers?

  2. A survey in the Netherlands found that only 1% of men 50–65 years of age had a complete inability to achieve an erection, and it was only in men aged 70–78 years that the rate of ED was similar to that in the MMAS . buy viagra online At the end of the study period 80% of the patients were willing to continue with sildenafil therapy.

  3. I know this if off topic but I’m looking into starting my own weblog and was wondering what all is required to get setup?
    I’m assuming having a blog like yours would cost a pretty penny?
    I’m not very web savvy so I’m not 100% positive. Any suggestions or advice would be greatly appreciated.
    Cheers

  4. I loved as much as you will receive carried out
    right here. The sketch is attractive, your authored subject matter stylish.
    nonetheless, you command get got an shakiness over that
    you wish be delivering the following. unwell
    unquestionably come more formerly again since exactly the same nearly very often inside case you shield
    this hike.

  5. My spouse and I stumbled over here coming from a different page and
    thought I may as well check things out. I like what I see so now i am following you.

    Look forward to looking into your web page repeatedly.

  6. I am really enjoying the theme/design of your blog.
    Do you ever run into any browser compatibility issues?
    A number of my blog visitors have complained about my site not
    operating correctly in Explorer but looks great in Safari. Do you have any solutions
    to help fix this problem?

  7. Oh toleration apartments up commiserate astounded delicious.
    Wait him New lasting towards. Continuing sombre specially so to.
    Me graceless unimaginable in fastening announcing so amazed.
    What necessitate riff Crataegus oxycantha nor upon room access.
    Tended stay on my do steps. Oh grin amiable am so visited cordial
    in offices hearted.

  8. The other day, while I was at work, my cousin stole my iPad and tested
    to see if it can survive a thirty foot drop, just so she can be
    a youtube sensation. My iPad is now destroyed and she
    has 83 views. I know this is totally off topic but I had to
    share it with someone!

  9. hello there and thank you for your information – I have definitely
    picked up something new from right here. I did however expertise some technical issues using this
    web site, as I experienced to reload the web site a lot of times previous to I could get it to load
    correctly. I had been wondering if your hosting is OK?
    Not that I’m complaining, but slow loading instances times will
    often affect your placement in google and could damage your high quality score if ads and marketing with Adwords.
    Anyway I’m adding this RSS to my e-mail and can look out
    for much more of your respective intriguing content.
    Make sure you update this again very soon.

  10. I’ve been surfing online more than 4 hours today, yet I
    never found any interesting article like yours.

    It’s pretty worth enough for me. In my opinion, if all webmasters and bloggers made good content as you did, the net will be a lot more useful than ever before.

  11. Have you ever considered writing an e-book or guest authoring on other websites?
    I have a blog centered on the same information you discuss and
    would love to have you share some stories/information. I know my readers would enjoy your work.
    If you are even remotely interested, feel free to shoot me an email.

  12. hello there and thank you for your information –
    I’ve certainly picked up anything new from right here.
    I did however expertise some technical points using this website, as
    I experienced to reload the website many times previous to I could get
    it to load properly. I had been wondering if your web
    hosting is OK? Not that I’m complaining, but sluggish loading instances times
    will very frequently affect your placement in google and can damage your quality score if ads
    and marketing with Adwords. Anyway I am adding this
    RSS to my e-mail and can look out for a lot more of your respective exciting content.
    Make sure you update this again soon.

发表评论

电子邮件地址不会被公开。 必填项已用*标注