关于Bowtie

关于Bowtie首先来说下以参考基因组建索引 下面是官网上的一个介绍 bowtie build nbsp buildsaBowti nbsp bowtie build nbsp outputsaseto nbsp 1 ebwt nbsp 2 ebwt nbsp 3 ebwt nbsp 4 ebwt nbsp rev 1 ebwt a

首先来说下以参考基因组建索引:下面是官网上的一个介绍

bowtie-build builds a Bowtie index from a set of DNA sequences. bowtie-build outputs a set of 6 files with suffixes .1.ebwt.2.ebwt.3.ebwt.4.ebwt.rev.1.ebwt, and .rev.2.ebwt. (If the total length of all the input sequences is greater than about 4 billion, then the index files will end in ebwtl instead of ebwt.) These files together constitute the index: they are all that is needed to align reads to that reference. The original sequence files are no longer used by Bowtie once the index is built.

bowtie-build DNA序列后会生成六个文件,分别是.1.ebwt.2.ebwt.3.ebwt.4.ebwt.rev.1.ebwt, and .rev.2.ebwt.


而当DNA序列很大的时候,超过4 billion,将会生产以end in ebwtl为后缀的相应六个文件

下面是bowtie-build所用到的算法,以及处理方式:

Use of Karkkainen’s blockwise algorithm allows bowtie-build to trade off between running time and memory usage. bowtie-build has three options governing how it makes this trade: -p/–packed–bmax/–bmaxdivn, and –dcv. By default, bowtie-build will automatically search for the settings that yield the best running time without exhausting memory. This behavior can be disabled using the -a/–noauto option.


The indexer provides options pertaining to the “shape” of the index, e.g. –offrate governs the fraction of Burrows-Wheeler rows that are “marked” (i.e., the density of the suffix-array sample; see the original FM Index paper for details). All of these options are potentially profitable trade-offs depending on the application. They have been set to defaults that are reasonable for most cases according to our experiments. 


The Bowtie index is based on the FM Index of Ferragina and Manzini, which in turn is based on the Burrows-Wheeler transform. The algorithm used to build the index is based on the blockwise algorithm of Karkkainen.


关于Bowtie-build的使用:

Usage:

bowtie-build [options]* 
   
    
  
主要的两个参数:
 
 
 
       

A comma-separated list of FASTA files containing the reference sequences to be aligned to, or, if -c is specified, the sequences themselves. E.g., 
 might be chr1.fa,chr2.fa,chrX.fa,chrY.fa, or, if -c is specified, this might be GGTCATCCT,ACGGGTCGT,CCGTTCTATGCGGCTTA.

 
       

The basename of the index files to write. By default, bowtie-build writes files named NAME.1.ebwtNAME.2.ebwtNAME.3.ebwtNAME.4.ebwtNAME.rev.1.ebwt, andNAME.rev.2.ebwt, where NAME is 
.

可选参数:

 
 
-f

The reference input files (specified as 
) are FASTA files (usually having extension .fa.mfa.fna or similar).

-c

The reference sequences are given on the command line. I.e. 
 is a comma-separated list of sequences rather than a list of FASTA files.

-C/--color

Build a colorspace index, to be queried using bowtie -C.

-a/--noauto

Disable the default behavior whereby bowtie-build automatically selects values for the --bmax, --dcv and --packed parameters according to available memory. Instead, user may specify values for those parameters. If memory is exhausted during indexing, an error message will be printed; it is up to the user to try new parameters.

-p/--packed

Use a packed (2-bits-per-nucleotide) representation for DNA strings. This saves memory but makes indexing 2-3 times slower. Default: off. This is configured automatically by default; use -a/--noauto to configure manually.

--bmax 
       

The maximum number of suffixes allowed in a block. Allowing more suffixes per block makes indexing faster, but increases peak memory usage. Setting this option overrides any previous setting for --bmax, or --bmaxdivn. Default (in terms of the --bmaxdivn parameter) is --bmaxdivn 4. This is configured automatically by default; use -a/--noauto to configure manually.

--bmaxdivn 
       

The maximum number of suffixes allowed in a block, expressed as a fraction of the length of the reference. Setting this option overrides any previous setting for --bmax, or --bmaxdivn. Default: --bmaxdivn 4. This is configured automatically by default; use -a/--noauto to configure manually.

--dcv 
       

Use 
 as the period for the difference-cover sample. A larger period yields less memory overhead, but may make suffix sorting slower, especially if repeats are present. Must be a power of 2 no greater than 4096. Default: 1024. This is configured automatically by default; use -a/--noauto to configure manually.

--nodc

Disable use of the difference-cover sample. Suffix sorting becomes quadratic-time in the worst case (where the worst case is an extremely repetitive reference). Default: off.

-r/--noref

Do not build the NAME.3.ebwt and NAME.4.ebwt portions of the index, which contain a bitpacked version of the reference sequences and are used for paired-end alignment.

-3/--justref

Build only the NAME.3.ebwt and NAME.4.ebwt portions of the index, which contain a bitpacked version of the reference sequences and are used for paired-end alignment.

-o/--offrate 
       

To map alignments back to positions on the reference sequences, it's necessary to annotate ("mark") some or all of the Burrows-Wheeler rows with their corresponding location on the genome. -o/--offrate governs how many rows get marked: the indexer will mark every 2^
 rows. Marking more rows makes reference-position lookups faster, but requires more memory to hold the annotations at runtime. The default is 5 (every 32nd row is marked; for human genome, annotations occupy about 340 megabytes).

-t/--ftabchars 
       

The ftab is the lookup table used to calculate an initial Burrows-Wheeler range with respect to the first 
 characters of the query. A larger 
 yields a larger lookup table but faster query times. The ftab has size 4^(
+1) bytes. The default setting is 10 (ftab is 4MB).

--ntoa

Convert Ns in the reference sequence to As before building the index. By default, Ns are simply excluded from the index and bowtie will not report alignments that overlap them.

--big --little

Endianness to use when serializing integers to the index file. Default: little-endian (recommended for Intel- and AMD-based architectures).

--seed 
       

Use 
 as the seed for pseudo-random number generator.

-q/--quiet

bowtie-build is verbose by default. With this option bowtie-build will print only error messages.

-h/--help

Print usage information and quit.

--version

Print version information and quit.

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/211250.html原文链接:https://javaforall.net

(0)
上一篇 2026年3月18日 下午10:52
下一篇 2026年3月18日 下午10:52


相关推荐

  • SIM简介「建议收藏」

    SIM简介「建议收藏」5月17日,国际电信日。在这天,北京通信公司开始对北京城里的政府单位医疗机构等集体发放小灵通号码,随着小灵通在北京市区的出现,以及中国南北两大电信公司的互联互通,网通电信移动联通4足鼎立的局面已经形成,传统的高价资费模式已经被打破,单向收费和准单向收费成为人们最津津乐道的话题。现在全国各地移动联通公司纷纷推出价格便宜、针对不同阶层的手机卡。我为大家介绍一下中国的手机品牌卡以所支持的功能,希望能为即…

    2022年10月7日
    6
  • NetApp存储术语介绍

    NetApp存储术语介绍存储存储即为在不同场场景内采用合理 安全 有效的方式将数据保存在某个介质中 同时也能保证数据被有效的访问和使用 简单说就是两个方面 1 安全有效的长期或者临时保存在某个媒介中 2 保证数据完整有效的存放并通过简单有效的手段访问常见的存储方式 DAS Directattach 直接附属存储 顾名思义为直连存储 简单架构为将存储直接通过点到点的方式直接连接在服务器上 NAS Networkattac 网络附属存储 顾名思义为通过网络 TCP IP 形式与存储连

    2026年3月20日
    2
  • 支付宝小程序上传图片my.uploadFile

    支付宝小程序上传图片my.uploadFile这里写自定义目录标题欢迎使用 Markdown 编辑器新的改变功能快捷键合理的创建标题 有助于目录的生成如何改变文本的样式插入链接与图片如何插入一段漂亮的代码片生成一个适合你的列表创建一个表格设定内容居中 居左 居右 SmartyPants 创建一个自定义列表如何创建一个注脚注释也是必不可少的 KaTeX 数学公式新的甘特图功能 丰富你的文章 UML 图表 FLowchart 流程图导出与导入导出导入欢迎使用 Markdown 编辑器你好 这是你第一次使用 Markdown 编辑器所展示的欢迎页 如果你想学习如何使用 Mar

    2026年3月17日
    3
  • JavaScript之闭包,给自己的Js一场重生(系列七)

    JavaScript之闭包,给自己的Js一场重生(系列七)JavaScript 之闭包闭包 非常重要但又难以掌握的概念 理解闭包可以看作是某种意义上的重生 你不知道的 Js 代码 例题让你全程思路清晰

    2026年3月26日
    3
  • 科大讯飞牵手「挑战杯」:聚焦国产算力,培育AI新质生产力

    科大讯飞牵手「挑战杯」:聚焦国产算力,培育AI新质生产力

    2026年3月14日
    1
  • PyCharm删除项目的方法

    PyCharm删除项目的方法采用 PyCharm 作为 IDE 时会发现 与其他语言的 IDE 相比项目删除起来比较困难 即使把源文件删除了 但是项目名称依旧保留 这里我们说明 PyCharm 中正确的项目删除方式

    2026年3月27日
    2

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号